Hacker News new | past | comments | ask | show | jobs | submit login

Give the data scientists SQL and relational algrebra.

But please don't give them stored procedures and triggers.






Data scientist who was given stored procedures and triggers. Can confirm dangerous activities took place.

I get triggers, but what's wrong with procs?

Not who you are replying to, but I believe the issue is version-control unfriendliness. At worst, they'll input everything to the DB directly, at best you'll have migrations, where you'll have to track down what the current piece of code is from a set of files.

You can, and I think you should, place your data and stored procedures in separate schemas. This way, migrations are only required for the data, and stored procedures can be deployed and version-controlled just like any other code.

What do you mean about version-control unfriendliness? Stored procedures along with all of the other schema objects such as tables, view, UDFs, UDTFs, etc... can be version controlled.

You can do it, but it is unfriendly to that model. If you track migrations, you're tracking the changes on top of a change system, so you'll often have to track down which file has the latest version of something manually instead of relying on the VC layer.

If you don't track migrations, and track only the latest CREATE statement, you can't deploy to a real system with that.

If you track both migrations and a final state, you'll end up in an awkward situation, because you'll likely have to do your migrations first and rely on automation to render your final state (in how many files? under what criteria?) or manually copy-paste it without making mistakes while coming up with your own structure. Or worse, writing on the final state set and then carefully copying changes back to migrations.

It's certainly not the worst thing in the world, but it is very unfriendly compared to the usual way of development under version control.


Okay... I see what you mean.

We've been experimenting with versioning schemas; deploying new versions of schemas along side existing ones. There is never any question about what version of a stored procedure is in that that schema. Same goes for all of the other schema objects. There is no schema migration tool needed. Of course I have to mention this is in data warehouse scenario and not an operational data store.


you CAN track the latest "CREATE" statement and deploy to a real system. Conceptually you only need to do a "diff" between your current "CREATE" and the on in the system where you are deploying. https://github.com/djrobstep/migra

Also something similar which uses apgdiff https://github.com/subzerocloud/subzero-cli


You bring up an excellent point. I had forgotten about auto-migrators because they encourage not being as conscious about the impact of potentially major changes to data since reading a tool's output is less cognitively engaging compared to writing something that works.

But stored procedures are another matter, this could work in a fairly reliable way. Have you used this in any project working in other people? How does it work out?


(genuine question) What are the best alternatives to triggers? And what makes them a bad idea?

I'm pretty much with you on stored procedures.


I am sure there are good use cases for triggers but I have seen quite a few databases that used triggers a lot and it almost always felt like the equivalent of spaghetti code. Things are happening and it takes forever to figure out why they are happening.

Depends on your perspective. Yes, if you are an application developer with weaker skills to access the data integrity logic in the database. No, if are a skilled database guy with weaker knowledge of all the source code of all the applications (could be multiple) sharing the database. Things are happening and it takes forever pouring thru each app's code to find its data integrity logic to figure out why they are happening.

By centralizing data integrity logic in the database, you know where to look and that all apps using the database will abide by it.


I agree with you but in the cases I saw I felt that the triggers were used to fix problems in the code more than being part of a consistent data strategy. I admire well designed databases but unfortunately there aren’t too many of them out there.

I think part of the problem is that there is still this huge chasm between good coding skills and good database skills. It’s hard to have both.


Triggers definitely have their place and can be very useful. But they can pretty quickly grow out of control leading to lots of side effects making it difficult to maintain and possibly leading to performance issues. I think twice before adding a trigger to consider if it is really necessary because using triggers as your first tool to solve problems can lead to a lot of complexity.

I like the author's point about this:

> The trigger function itself was also written in the “schema.sql” file which meant that everything was clear and documented along with the other relevant schema information, thus easier to track and manage.

In my experience, in most cases the application doing the insert/update/delete is the best place to handle most side effects. That may be an on-demand API, a scheduled/queued job, etc. I've found triggers to be helpful and simple for audit and history tracking. There are definitely plenty of cases where triggers are useful and it probably depends a lot on your architecture.


We're talking about "data scientists" for whom software development is not their primary focus. Since SPs and triggers do not share SQL's advantage of being declarative; they are instead imperative, and non-developers should spend as little time as possible within imperative domains.

I'm not as anti-trigger as I am SP. Though I stay away from triggers in any case as the rest of the world is fairly negative on them.


For which usecase(s)? Triggers seem to be used for many different things...



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: