In particular, Git has made improvements recently working with giant (as opposed to merely bit or still-too-humongous-to-manage) repos so it's now workable after massaging the config.
The other alternative to Perforce is SVN. There have been some improvements with it that make it more workable in today's git/"decentralized" world. Not knowing your particular use case it's hard to say if that's at all appropriate.
I found it a lot in small to medium sized business which were transitioning from the couple of people who wrote all their software to a team needing to do things properly and consistently. But I also found it at 50 person dev teams, which was weird and the worst.
I've generally always managed it as code, just easier that way.
Most likely they're happy to offload version control to add-ins. There are a few, but most of them are way out of budget for small businesses or small IT departments. Which means version control usually doesn't happen.
This has been such a problem for me with clients that I started a side project to try to help fill the gap. If anything it's shown me that the problem is more widespread than I originally thought.
My previous employer created stored procedures in a MSSQL database and called it a day.
So if it were to be a traditional MVC design (which it quite wasn't), it would have both the model and controller in the DB and the Java bits only handled the view part with JSP and some other outdated tech. Rendering a view basically involved calling a DB stored procedure which prepared everything that should be visible and returned the data to the "fake back end" for further processing, before it got sent off to the browser. And saving new data or editing the existing data involved the same approach, but with numerous function parameters (think 20 to 40).
On the bright side: it worked fast.
On the not so bright side: everything else was horrible. The Java code was nightmarish and badly maintained, the discoverability of it was horrible since it also had to deal with naming things similarly to how they were in the DB, with unreasonable identifier length limits. The DB code was almost impossible to version well (no automatic migrations to speak of), routinely broke and there was little to no logging in place, as most DBMSes out there handle the concept of "application logs" pretty badly, unless you write your own, which the other devs hadn't done. Oh, and you can forget about putting a breakpoint in those stored procedures, or even having Apache2/Nginx/whatever tell you what was being called for any action. It really taught me a thing or two about whether i should eagerly agree to help with code that other companies have developed and someone now needs to fix/improve.
Since then, i've also seen all sorts of systems, some that have automated migrations but don't have a baseline, some that have so much data and complexity that everyone has to share a test DB instance, others where the migrations are automated but there are no seeding scripts so a newly initialized DB instance has the correct structure for local development but the system is not usable because of no data. Of course, i've fixed what i could over the years, but there are very few approaches that truly work.
The most functional approach that i've seen: automatic DB migration scripts in the project (versioned in VCS) from day 1, each developer having their local DB instance, with seeding scripts for that as well, so that they can do breaking changes locally and test them out as often as they like, without fearing ruining the schema (after all, locally you can just wipe the schema and data, run the stable migrations/seeding scripts and continue from there again), no manual changes on DB servers to schema/data (unless in an emergency, but even then the same should be done in scripts later).
Additionally: it's useful to have scripts that won't fail if you run them more than once (check if data needs to be altered first, do so if necessary, otherwise do nothing, maybe output information in some log table).
Optionally: it can be useful to have the ability to reverse migrations but sometimes that's not easy to do and increases the total work ~2X, depending on the complexity of your migrations. Forward only approach has been sufficient for most of my projects.
Optionally: i've also explored model driven development with MySQL Workbench, where i made all of the models in ER diagrams and used forward engineering to get the SQL for scripts. It was a wonderfully nice approach, but pgAdmin and SQL Workbench as well as others don't really support anything like that, so it's a niche concept: https://dev.mysql.com/doc/workbench/en/wb-forward-engineerin...
However, as a data engineer I've been at some companies that treat SQL differently and seen bad practices that I believe the article is addressing, like putting SQL statements as stored procedures or materialized views instead of in git.
But store procedures give you a single place to call SQL. If there is a procedure that does a standard calculation used by 5 reports then put it in a function or a stored procedure and have every report call that procedure / function.
To recreate the database upto the current schema, just check out the list of schema delta files. Play them back one by one in the order of the version in the file names.
Track any prepopulating table data the same way.
Edit: having a table called schema_version to track the current applied schema version of the database is very helpful. Then you can upgrade the db schema from any version rather than from the base schema by checking the schema_version table before applying the versioned schema delta files. Be sure to put an insert statement to the end of every versioned schema script file adding a row containing the version of the file to the schema_version table.
Edit: Or you might choose to be explicit with the version of the delta files in a branch. Each branch has its own delta file naming format. When merging into main, roll the changes from all delta files into a new main delta file. If main has moved forward a lot, you have to insert the changes carefully.
I'm the author of declarative schema management tool skeema (https://www.skeema.io, for MySQL / MariaDB). Some other options in this space are sqldef (https://github.com/k0kubun/sqldef, for MySQL or Postgres) and migra (https://github.com/djrobstep/migra, for Postgres). In MS SQL Server, SSDT DACPACs are also somewhat similar.
But your woes seem more related to the transactional systems.
Of course, recently i've also found Dbmate (https://github.com/amacneil/dbmate) which is a lightweight and language-agnostic tool that supports a variety of DBMSes (PostgreSQL, MySQL/MariaDB, SQLite and more recently even ClickHouse) and can be used for running SQL migration scripts regardless of what technologies that particular app uses (Java, .NET, Node, PHP and so on).
Now, setting up the CI might be a little bit more difficult, but having one tool across all of the services in your architecture can be worth it!
- DML is declarative; actually, though, I prefer to think of my query (or update) in terms of a pipeline splicing tuples together, filtering etc, and I'd be happier if I could write it as such, i.e. in a more imperative style (to express intent I mean - the planner should still be free to do all sorts of clever shenanigans, as now)
- DDL, otoh, is imperative: "add this index", "remove this column". But here I want a declarative style! i.e. "I would like my table to look like this - make it so". A lot of the aggravation with database maintenance comes from having to manage a stack of non-idempotent DDL operations in order to wrestle the db into the required shape.
But for analytics systems, you can get close to having declarative, idempotent DDL by doing CREATE OR REPLACE VIEW ... SELECT for derived data. Some systems even allow transactional and this idempotent DROP TABLE IF EXISTS; CREATE TABLE ... AS SELECT.
This doesn't in any way help with indexes, with tables that are primary storage and you can't just drop and recreate them and so on, so there definitely is something missing.
Seems like a very basic and common approach. How do shops that don’t use it track database changes? Can’t remember the last time I was involved in a (non-legacy) project that did things differently.
Because...it's a programming language, and therefore...it is code?
It's also literally the classic example of a "fourth-generation programming language."
No version control, no centralized overview of queries, dependencies, code linting or auto-completion etc.
Normally these tools are chosen by unexperienced, rather non-technical data or business people, to whom the initial ease of development is highly appealing. Of course compelxity quickly spirals out of control as LoC across all kinds of different transformation steps increase.
I would love to get every office users to do more with their computer than copy-pasting all day, but I came to belive that the problem is mentality, not tooling.
There is a reason why no-code data wrangling tools are a multi-billion dollar industry.
I wrote the article because many companies treat SQL like excel files. It gets emailed around, unknowingly forked and modified, saved on desktops or shared file storage, and nobody knows how to fix anything.
For everyone who said “SQL is code is this news” - I agree with you. It is code. And it shouldn’t be news. And I hope all companies implement version control with SQL. I see Java version controlled at a significantly higher rate than SQL. That’s all. Good day.
Spend the time to migrate scripts from local locations to global locations, eliminate views, materialized views, and stored procedures.
Wow. So things will be better if we don't use the modularization you can get through views? I've just finished up a tool* to make it easy to keep code for SQL views in VCS, but I must have been thoroughly misguided ;-)
Though the real argument is that Tableau Workbooks, Alteryx ETL-graph-things, and the whole panoply of opaque propriety formats that contain, in their heart of hearts, a bunch of SQL are also themselves "code" and deserve the same testing, source control, code review, and deployment practices as the rest.
SQL is code.
So what else would I treat it like?
I've worked at places that threw scripts into a folder somewhere in a project, so that it would at least have history. Again, change control was a disaster.
If you're not using a change control tool of some sort specifically for databases, and using it and only it to manage all changes, it's going to be a disaster.
I still remember the time when it was deemed normal, or at least common, to email around diffs and patches in a dev team, or upload everything to the "code server" in little directories with iso-timestamps as names.
Not taking care of code properly, aka. in a VCS, is a disaster, no matter if its C, SQL or utility bash-scripts.
At each step, the code gets commited to a git repository.