4) is same as three but you take a subset of the data with correct relations, anonymise it carefully and use that, it has a lot of complexity at scale but is about as faithful as I’ve seen, it’s also one of those tools that forever requires maintenance as the core db mutates.
For local devs, seeding the database with plausible correct data works pretty well.
The big benefit here is not just your staging database, but everybody's locally running databases also get prepopulated data which is really useful for development and testing.
Have they gone through the experience of having a persistent staging environment that slowly drifts from production (1)? If they haven't, they can't possibly understand why that is a bad idea. I'll just go with the flow until they realize. Maybe I'll hint at the possible issues sometimes.
Have they figured out that copying prod is a bad idea (3)? If they haven't, same thing. They can't understand why that sucks and why that's not true reproducibility.
Finally, (2). Fixtures! It's also a journey. There are so many things that can go wrong. Knowing those things depends on having gone through those journeys with a persistent staging and production copies.
There is no relief from the pain. No magic bullet. No product or solution that will ever solve this. You have to go through those stages. If you're lucky, someone will guide you through them (in practice). The journey can be sped up, but I haven't seen a shortcut that works (like forcing the team to adopt a practice without them internalizing it).
That is database schema. The post asks for opinions on database contents. They are related though, so it's a valid question.
> I suppose you include a script to update staging
Essentially, yes. But if I write a script to do it on staging, why don't I write one for prod as well? Maybe I could even go beyond that. Make a simple tool to handle those scripts for me (that's migrations).
It requires some version control and deployment discipline, which is something you probably wouldn't have if you're adding columns directly on the prod db.
I learned very early that no developer should have access to prod db. Not even read access. No writing, definitely. Schema changes by a single developer outside version control should be impossible.
4) is same as three but you take a subset of the data with correct relations, anonymise it carefully and use that, it has a lot of complexity at scale but is about as faithful as I’ve seen, it’s also one of those tools that forever requires maintenance as the core db mutates.
For local devs, seeding the database with plausible correct data works pretty well.
Also 2, as part of migrations:
There 3 types of migration files/scripts: structure, basic-data, dummy-data.
structure - new table, add column goes here basic-data - e.g. default config values go here dummy-data - gets used on local and stating
Run migrations with a flag to include the dummy-data migrations.
+1 for this approach.
The big benefit here is not just your staging database, but everybody's locally running databases also get prepopulated data which is really useful for development and testing.
It depends on where the rest of the team is.
Have they gone through the experience of having a persistent staging environment that slowly drifts from production (1)? If they haven't, they can't possibly understand why that is a bad idea. I'll just go with the flow until they realize. Maybe I'll hint at the possible issues sometimes.
Have they figured out that copying prod is a bad idea (3)? If they haven't, same thing. They can't understand why that sucks and why that's not true reproducibility.
Finally, (2). Fixtures! It's also a journey. There are so many things that can go wrong. Knowing those things depends on having gone through those journeys with a persistent staging and production copies.
There is no relief from the pain. No magic bullet. No product or solution that will ever solve this. You have to go through those stages. If you're lucky, someone will guide you through them (in practice). The journey can be sped up, but I haven't seen a shortcut that works (like forcing the team to adopt a practice without them internalizing it).
If you go with (2), and you add a DB column in a PR, I suppose you include a script to update staging with the new content at the same time?
That is database schema. The post asks for opinions on database contents. They are related though, so it's a valid question.
> I suppose you include a script to update staging
Essentially, yes. But if I write a script to do it on staging, why don't I write one for prod as well? Maybe I could even go beyond that. Make a simple tool to handle those scripts for me (that's migrations).
It requires some version control and deployment discipline, which is something you probably wouldn't have if you're adding columns directly on the prod db.
I learned very early that no developer should have access to prod db. Not even read access. No writing, definitely. Schema changes by a single developer outside version control should be impossible.