We looked for some obvious optimizations like eager loading and some map reduce magic and implemented them. That brought us to a range of 45 minutes to 2 hours.
We decided that it was still not good enough. We moved ALL of the report into SQL functions (Postgres) and we have the whole function running in 27 seconds.
I agree with some of the top comments that maintaining this is a little trickier than ruby code, but some ideas we came up with in maintenance:
1) Write the SQL in a ruby file that has a class method of sorts that returns the generated function. This allows for version control to track the changes of the sql function.
2) Focus on tests at the interface level. We had great test coverage on the end result prior to moving to SQL. We were able to re utilize those tests to verify accuracy with calculations that are now in SQL.
This really isn't for everyone, but for those that want to try something new, it is worth it.
If a programming language wants to stay fast it must eventually become a database. I realize this is an unpopular opinion but popularity is the wrong metric to judge by.
Admittedly the code I was moving away from was hilariously bad. Threaded custom build of PHP bad. Then again I haven't tried to optimise the code I wrote at all
You are absolutely right about fast code becoming a database, this is simply down to the query planner - it can try and do the least possible work for the data you actually have.
I implemented a small function where you could list the tables that you were working with, and it would create empty temporary tables with the same name in a namespace that takes precedence over the normal namespace in postgres' search_path, therefore giving you a blank slate to work with for the unit test, while the other tables were still populated with useful data. (Some of these data come from expensive infrastructure scanning jobs, so starting from an entirely empty database was not an option.)
Just curious - not sure if you were using Ruby like the parent above. I literally just built a library to back ActiveRecord models with temporary tables. It's very useful since you can use AR scopes off of your temporary table then (I had to do this instead of using CTE's because my DB currently doesn't support CTE's)
Just thought I'd share it in case it helps, or to hear if anyone else has a better approach to this. (warning: it's very experimental and not production tested yet). I couldn't find any other gem that would do this.
How did you resist the temptation to call it temptation?
Although I think temptable is almost the opposite of contemptible, therefore also good going in the pun department.
My code is all Ruby, and I ended up pushing all the work into SQL and just eventually selecting with find_by_sql into existing models for the results. There is possibly concurrent invocation vs updates and a race against itself, so it's also all wrapped with a serializable transaction and starts with a mutex lock on the controlling model.
The approach I went with works really well for cases when you want to persist the temporary table through the end of the request (it works good for master/index type views that may have a lot of sums/counts and filter options available on it).
Unfortunately I don't think I have the skills for that, so posting here in the hopes that someone that can likes the idea :)
One thing that makes improving it cumbersome is that the semantics of many operations are slightly different in the database than in C#. For example, SQL Server doesn't do NaNs and infinities, supports a zillion string string collations, and its date types may not 100% map to C# ones.
Also, databases may run stored procedures faster than the SQL that LINQ genrerates on the fly because they can cache their query plans (most databases will detect that an on-the-fly query is equal to one run recently, though, so this may be somewhat of a moot argument)
Sometimes I think I should just be using sql.. (which we do on other projects)
Sometimes it's really just easier to write SQL directly.
In order to get good performance you want to minimize number of back and forth requests over network. So instead of making a request to obtain list of items, and then fetching each of the item one by one (so called N+1 select issue) you will get a better performance if you make the database send only the data you want, nothing more, nothing less.
ZODB itself is essentially a transactional single-object store, where that "single object" is usually the root node of an arbitrary (possibly cyclic) object graph. Storage depends on the backend, nowadays you'd usually use file storage for development and relstorage for any kind of deployment.
It doesn't have any query language, Python is. There are no indices beyond those that you explicitly create (eg. a dictionary would be a simple index, but other packages provide more complex indices, like B-Trees - the nice thing here is that anything that goes into the DB is fully transacted, which removes a lot of headache that you'd have with other solutions (eg. flat XML files)).
ZODB thrives when your data model is too complex to easily or efficiently map into SQL, when you don't do or need efficient complex ad-hoc queries and when 180 % performance and concurrency (although this was recently improved through the MVCC reimplementation) isn't the highest priority. Since it's pretty good at caching (much better than most ORMs) performance usually doesn't suck.
1. I believe they added SIMD support a year or two ago and got a massive (2-10× IIRC) speedup.
You... realize you could just put the SQL itself in git, right? Not saying there aren't potential benefits, but "it can go in version control" isn't one of them.
Sorry for not explaining that fully.
I suppose you could also execute "REPLACE FUNCTION" and redefine the inside a new migration each time you want to change it. The downside is that you end up with the whole function in new a migration every time you want to change if. But if you're not updating your stored procedures very frequently (which is usually the case, IME), it probably wouldn't be too bad.
The system works pretty well.
Just start the second file with DROP SCHEMA IF EXISTS schemaname CASCADE; CREATE SCHEMA schemaname;. Then, make sure any functions/views/permissions built on top of this schema are also contained within this file.
Now it's an ops problem :-)
Why not just load all the data from SQL into a DataFrame in Python or R and do the processing there? I've written some pretty big R DataFrame crunching scripts, including very wide-window moving averages operating on tens of billions of records, that run in less than five minutes on my laptop.
Hell, I have a web app built with Flask+Pandas, running on a single t2.small instance, that does complex queries into a 6000 records big chemistry data set based on user input of selectors. It can serve thousands of simultaneous users without noticeable slowdown.
What am I missing here?
In our original ruby process, it would return just that, 240 records per join and aggregate on that. Our SQL process aggregates that in a stored procedure and returns the flat 4000 records.
We really didn't want to introduce another piece of technology to this stack, hence we didn't look at R or Python or Elixir. We did consider storing all of this inside elasticsearch for a hot second but keeping things in sync at the correct times seemed sub optimal and would add a layer of complexity we were all uncomfortable with.
The absolute worst case of a "join" is a cartesian product which runs in quadratic time. 6000^2 = 36000000, which should still easily finish in ~1 second.
I don't know the Rails equivalent, but Django's ORM has a way of prefetching related fields on a model in a single query.
My bet is that your code wasn't 1 query, but actually 4000 * 240 queries, which will always take a while.
EDIT: I just saw in a sibling comment that the actual data fetching was only taking 30s. I imagine you were doing that part mostly right
You wouldn't likely put the whole app to run on the database server, but typically PostgreSQL makes good use of the available RAM and disk, but leaves lots of CPU cores idle. So running some R and Python on the PostgreSQL server is an overall improvement in performance.
P.S. Yes, I know that PostgreSQL 9.6.1 will use multiple cores for aggregate and join queries on large tables, but few people have upgraded to that yet.
One way to think about this is that you just want to take control back from the ORM of the exact database queries sent down the pipe. Which, BTW, is why every ORM worth its salt has functions for dropping down to plain SQL - teach that as best practice for doing computation in SQL queries, not stored procedures.
At the end of the day, I would just use the tool that has the most comfortability with the team on the project. You can obviously start with an ORM, but there may come a point in time where performance needs a little boost.
If you think you can't version control stored procs, then you are doing something badly, badly wrong.
What's the difference between a "script" which is a text file on a filesystem at the end of the day, versus a "stored proc" which is a block of text in a database, loaded off a file in a filesystem?
Or your stored procedures now become one more thing that has to be backwards compatible, in addition to your DB schema...
I've run into all of these problems with stored procedures, and they SUCK. Good deployment tooling can solve them, but it's a lot of effort for questionable reward.
It's a small amount of effort for a thousand-fold increase in performance and not having to re-implement the same functionality in every app and in every language that connects to that DB. YMMV.
If you're not already version controlling your schema, you should seriously consider starting whether you're using sprocs or not.
There are other patterns you can use to track via version control the changes for your stored procedure.
Always pure SQL, at least as you're using a DBMS. Never anything else.
SQL can stretch to full Turing completeness, so it'll always be good enough for that. And between JSON support and PLPGSQL most everything can be done in Postgres.
Plus it solves the whole schema impedance mismatch thing to just leave it up to the DB.
ORMs aren't that great. Having spent the past five years primarily with an object database that is also fully SQL (ISC Caché) I find myself preferring SQL more and more. Even when you have the ORM within the bloody database I'm going SQL.
The string thing isn't an SQL problem, it's a tooling one.
The only tool per se should be pgAdmin or an equivalent. And in your application the only "SQL" that should be written is SELECT * FROM viewforthisfunctionality and select runupsertwithjsonobjectthatpgorplpgsqlcanreadjus tfine(json)
What's an "anonymous table of records"? Do you mean it's just a result set (i.e. query)?
I've seen some huge result sets (thousands of rows) returned to do a simple aggregation for a dashboard widget.
I kept that CPU usage chart and added it to my CV as it was quite impressive to look at.
Honestly, instead of looking into more performance gains, I wanted the team to get a huge win and for the team to look good in front of senior management :D.
Using functions as an intermediary to raw queries, also decouples the SQL details from the application. What I mean is that if the application calls an SQL function to read, update, insert data then you can revise the SQL functions independently of application code. In other words, if you refactor an SQL function, you can deploy the change to the database without rebuilding and deploying the application.
This is especially helpful in companies that have an onerous verification and validation process for application changes, because often "performance improvements" are classified as administrative changes which can be done with less bureaucracy.
So, embrace PostgreSQL as a permanent part of your application and reap the benefits of its many powerful features.
I recently refactored several functions which kind of organically grew from several different functions each with it's own SQL query into one big query.
Not only did performance increase by a significant amount, the whole function is basically just "run this query and return the result" as the JSON functions in postgres allow me to leave the actual processing in the database and to not have to do the always annoying "loop over the 30,000 rows that were returned to build a JSON file" thing which is never fast.
I'm not even sure why people would use something like Django Rest Framework, when Postgres outputs JSON directly from a query.
Anything more complicated than that, and I'd recommend a simple Go application.
- Version control. The SQL functions end up just being data in your DB. This feels like an anti-feature in the world of continuous integration and code review.
- Lack of composability. An ORM gives you the ability to easily compose filters. This lets me write functions like "take this query, and make sure it's properly filtered down to one user's data". Because of how SQL works, each query needs to be hand-crafted, and it's hard to make parts of it generic.
- Rewriting business logic. I have business logic in Python already, but if I want to query off of that in the DB in SQL, now I need two implementations. You can sort of get around this in an ORM by annotations.
- I'm not sure what the developer tooling environment is like. PyCharm is pretttty nice.
To be honest, you citing that it's easy to get around the validation process when using Postgres is a major red flag. Why would I want to circumvent code review for things that are most undoubtably code?
Functions can be treated the same way as migrations. With PostgreSQL you have
transactional DDL, so you can atomically deploy and rollback functions. There
plenty of different script packages to help manage this. At the end of the day
it’s deploy and rollback SQL scripts in text files, which are easily managed with
Lack of composability. An ORM gives you the ability to easily compose filters. This lets me write functions like "take this query, and make sure it's properly filtered down to one user's data". Because of how SQL works, each query needs to be hand-crafted, and it's hard to make parts of it generic.
The trade off here is that you’re trusting the ORM to produce an optimal query
(or queries) for whatever you’re trying to do compared with writing the SQL
yourself, with knowledge of your schema. Like any trade off, its one you need to
decide for yourself.
You may have filtering on the data that you’re returning from the database
that you chose not to do in the query itself. At this point, you should just
be dealing with data structures. I don’t think this would be different whether
you’re using an ORM or not.
Rewriting business logic. I have business logic in Python already, but if I want to query off of that in the DB in SQL, now I need two implementations. You can sort of get around this in an ORM by annotations.
I’m not sure what you’re getting at here, so I’m not sure if this addresses your
concerns: If you’ve got clean separation of your code, you should only need to
update a single location for any given piece of logic. Single source of truth, and all that.
I'm not sure what the developer tooling environment is like. PyCharm is pretttty nice.
SQL scripts are generally text files, so anything that supports some kind of SQL syntax
highlighting is likely all you need. I suspect PyCharm will work. The Jetbrains folks are
Why would I want to circumvent code review for things that are most undoubtably code?
Since you can have your scripts under version control, you can likey use the same code
review tools that you currently do. You can also use libraries such as pgtap to test
your schema and functions indepedent of the rest of your application, if you so choose.
These are all issues I've dealt with in a large production PostgreSQL environment. They're good things to think about, and you're right to consider them. At the end, you need to make a decision you and your team is comfortable with.
It was absolutely horrible to develop and a nightmare to test.
For one, there's nothing approaching ActiveRecord migrations in stability and ease of use, so schema changes on dev/test/deploy were a huge pain. Then, you can't really work in normal "auto-run tests on save" mode as you would with saner frameworks, syncing source from git to postgres is needed (the principal dev altogether preferred to edit directly in pgadmin and then sync). Then, SQL tends to be a lot harder to read than python/ruby, especially so if not well documented and with many join tables. And finally, you can't scale out the heavy stuff, you are limited by your master instance basically. 2/5, would not recommend.
SQL is rather readable these days. It's obviously not as easy to read as ruby or python and is slightly more verbose, but it is certainly readable.
Mileage will vary of course. There are cost/benefit trade offs with every architectural decision made.
As someone who can't read Ruby but can handle "any level" of SQL fluently, that's obviously not a very objective statement.
Most engineers I have met in my career have really depended on an ORM to do all of their heavy lifting. They can write some of the most eloquent ActiveRecord queries but would be unable to create the same functionality using raw SQL.
Again, deeply sorry if I offended. It was not my intention.
And I can sympathize. For example, I wouldn't even know how Rails maps a many-to-many relation to SQL tables, so I would have to use the rails console myself to traverse such a relation.
Those that are really good with an ORM can generally go into SQL with ease.
The above is not meant to be objective, just my general observations.
They are a fine tool, but I find it weird that people are using them without knowing the underlying technology which they are based on.
And you also don't need ORMs to do query parameterization.
You definitely can, but you'll have to roll your own watcher and sync, there's no Guard equivalent that I'm aware of.
>> SQL is rather readable these days.
Used to think so as well. But then I had to read some pretty nasty sp's which calculated discounts depending on what user already has, and geolocation, and time of day, and day of month, and tons of other stuff, and the return value had to be a complex hierarchy of categories/products. So what would be 2 pages of python turns into 500 lines of SQL with some statements 80 lines long, plus it was still a pain to json-encode in python. The only positive thing about it was that I wasn't the one who had to write it.
[shameless plug] here's mine :) https://github.com/oelmekki/pgrebase
EDIT: sqitch is also often mentioned in that field http://sqitch.org/
SELECT-only reporting code like this is normally run against a query slave, and with a bit of automation you can stand up one of those pretty much as easily as another webhead.
> In any case, we need to do heavy calculations AND transactionally update some state.
In that case, I'd have the app server connect to both master and slave, run the heavy calculations on the slave the same way, and then send the updates to the master.
- Day-to-day tooling to work on it
- Switching costs (mental overhead, using multiple languages)
- Probably more stuff
The ruby code is just like all of your other ruby code and can be grokked and worked on by anyone who is up to speed on your ruby stack. The DB function meanwhile requires jumping through lots of hoops and will always feel like a second class citizen relative to the vast majority of your application code, unless you do a lot of work and maintenance (unlikely to happen without feeling distracting if this code is a relatively small slice of your code base.)
In some cases, this can be worth it for massive optimization wins. But the article doesn't really touch on why this might be a bad idea.
I do think it highlights the potential of systems like MADLIB  where you can push the data crunching into the database, while having that be abstracted away in your primary programming language (in this case, R.) One could imagine a world where you wrote the ruby code and it was somehow mapped in to be run on the database. But that's not the world we live in today unfortunately. It's really a shame, because people using a database as capable of PostgreSQL but with a fairly dumb ORM on top are under-utilizing its potential, for sure, but there's not really a great path afaik to leverage it. (I remember years ago MS introduced running C# inside of SQL Server, I am not sure if that ever ended up panning out into being something smarter than the dumb "upload code to database manually" model we are used it.)
But I'm a python/C#/.Net person.
The stored procs/functions go into a repo just like everything else does. It's not that big a deal. Alembic for migrations is pretty solid. And if you're dealing with a large code base, you log changes inside the DB itself. Use the database for version control.
It's really manageable. It requires that your dev team really knows SQL. But I don't think that's a bad thing.
It's fast, gets the job done, and if you're running a language that doesn't make it easy to deal with concurrency, it's nice to just let the database handle transactions.
For everyone who isn't facebook or google, this is the way to go, in my experience.
Let the db do the hard work, let your preferred language be the middleman, and have a reasonable front end framework do the presentation.
I can scale a web app to millions of users on a couple hundred bucks a month of hardware with this approach. You can too.
For non-distributable (or at least non-multi-master) DBs, you are often using your most expensive and least distributable resource to do your cheapest and most distributable work. My rule of thumb is that unless you need atomicity or the performance (as profiled) makes a real difference, do it outside the DB. Also has the side effect of avoiding system lock-in (at the expense of language lock-in which I think is ok).
Granted, it's subjective, but I've had the most trouble scaling DBs than other systems, and it's usually CPU/mem of business logic pushing the boundaries.
Even if I didn't need it for a project, I would probably start there because, you know, that's the hammer I have, so SQL is the nail I'm going to start with.
I haven't had problems scaling DBs though. DB throughput has never caused a problem for me. What has caused problems is business logic put in what I consider the wrong places.
Business logic belongs in the database structure wherever possible because that's the single source of truth for your data. That's where all of your controls for data integrity belong.
For any app that has longevity, you are going to have multiple methods of accessing the data. There will be APIs and crappy junk that wants your data. You have to account for this. There will be users logging directly into your database. Sometimes it will be your boss.
Start with an ACID approach to your CRUD app, and you probably won't go wrong. At least not soon. And you have options for scaling when you need to. You can scale vertically or horizontally or both.
I see your logic, but I think this is misleading, and often the reason why people do inefficient things.
The pattern I see regularly is if the database does more thinking, its planner will diminish the amount of actual work; it will touch fewer rows from disk, and less data will result sent over the network and in fewer round trips. Overall for any extensive report you're likely to be paying significant cost to distribute the work, often more than actually doing the work.
Of course, as you say it's subjective. Scaling databases becomes necessary at a point, but it may be because of developers not embracing them; I think this is why NoSQL has gained in popularity.
I'd question the credentials of any developer who doesn't know SQL. It's an essential skill across the entire IT-indutry. Not knowing it means you're obviously not serious about your job.
And here you're talking like programmers not knowing it is a normal thing... Are we working in the same industry?
There's a lot more to life than CRUD apps.
I'd like to support your comment, and argue for a more restrictive claim. Namely, a developer is more valuable the better they know their tools. If they're working with a relational database, then it's good for them to learn more about how to use it. However, if you're running a team that is building some software system, you ought to make technology decisions which make sense given the skills of your team members; so, if the team's SQL knowledge is weak and there aren't compelling performance reasons to push lots of work into the database, it's probably not the right choice.
I don't agree with the part of your comment that says, essentially, work with the tools your team already has.
There are some cases where you have to choose the technology that makes the most sense and your team has to get with the program. If you're storing data in any way, you have what I think of as an obligation to choose the best technology to store it.
I know and have worked with people who would write everything to a csv file because they know how to do that and prefer to do that. And toss it on a server somewhere with no redundancy or backups of any kind.
We all come from different places and have different jobs. But some times you have to suck it up and learn a new thing to be effective in this industry.
Sometimes a CSV file actually is the right answer. Sometimes a database of some flavor is the right answer.
The point I was trying to make and didn't do such a good job of was that if you're dealing with a web app, a database is probably the right answer, and if you're using a database, you should maximize what it can do for you.
There are a ton of jobs which my comments don't apply to. And yours sounds like one of them.
There are are entire areas of my work that also have nothing to do with collecting or storing data, and I recognize that SQL has no place in that work.
I was speaking to the bulk of my work and assuming (perhaps wrongly) that we were talking about similar things.
My bad. And you are totally correct about life being more than CRUD apps.
I'm sure you'll agree however that for every one of you, there's ton of developers who are less programmers and more assembly line workers gluing framework pieces together :-)
On github I found , which is embeds Ruby in PostgreSQL. I haven't used it and don't know how mature it is.
Using the same scripting language in your stored procedures and application code might lessen the impedance mismatch. It doesn't solve testing, deployment/migrations and tooling.
If you encapsulate the database and related logic behind its own service interface, then as you get big enough to care about query performance, then only the team working on that microservice needs to care about how complex/tricky/whatever the optimizations become!
I say this because I've seen it happen multiple times.
So, putting some of the business logic into the database can be done using modern development best practices.
Yes there are switching costs and it means that developers need to really learn SQL and PostgreSQL's advanced features but it pays off. And if you break the lockstep of db migrations and application deployments then you can easily have people specialize by language.
In fact, I would say that the era of the DBA is fading and what we really need are devops developers with a strong understanding of PostgreSQL who can do db architecture, db engineering, db design, and SQL coding/refactoring.
Probably my perspective is different from a lot of people because I worked many years in large ISPs where Network Engineering folks designed everything, built it in test labs, verified everything, documented how to deploy and then handed it over to an Operations team that actually set up the routers and installed the configuration code. I don't see coding as the top of the pyramid. Instead it is Engineering Design and Architecture of applications that we should all aspire to.
Then again we tend to work with the likes of SQL Server, Oracle and similar.
edit: (assuming i can't get it working with a jury rigged cronjob and the usual dumb solutions that will still use the ruby code. i'd try a bunch of those types of solutions first obv.)
All of these databases kick the crap out of HN darlings like Redshift and BigQuery. Special mention goes to Vertica which is free up to three nodes and/or 1TB data + 1 TB in flex tables. It is, like the rest, analytical AND transactional. Has proper admin features, proper edge case coverage and a truck load of features. It also has a very sensible licensing.
But then it takes another leap and says that this means your database should actually compute that value. I don't see why that's necessary, and it would bring with it all sorts of issues that other people have mentioned (maintainability, scaling, testing, coupling).
It seems you could equally solve the problem by adding an additional indexed 'ranking' column, and computing and storing the value at the same time you insert the row . It seems that's essentially what Postgres would do anyway.
Also, I'd note that the algorithm here is very simplistic, and even so, the author had to make a functional change in order to get it to be performant in Postgres. You can't just substitute an ID for a timestamp, just because both are monotonically increasing. The initial Ruby version of this algorithm treats a given 'popularity' as a fixed jump in some amount of time (e.g. 3 hours). The Postgres version treats it as a fixed jump in ranking (e.g. 3 ranks). Those are not equivalent.
 This does assume that you can control all the database manipulation from a centralized place.
Yes, but if Postgres does it, it's literally impossible for the developer to screw up, whereas manually putting the ranking in the table can easily cause inconsistent records if you don't know what you're doing.
And even if you know what you're doing, next month's code change might not remember that it's important to update the ranking with the rest of the record (unless you're calculating it in a before_save hook or something).
That said, despite some of the tradeoffs - for example with testing, or migrating to a new db (which is extremely rare anyway) - it is ultimately far better for client code not to have to know back-end implementation. I think things like graphql are finally making this abundantly clear.
What I have been dying for is a real time DB BAAS that is ACID compliant, preferably relational, has a simple rest api, and allows me to write stored procedures that I can call from my client code. Horizon is probably the closest thing out there. Right now using firebase, and sick of the absurd amount of client side code I have to commmit to to pull data from various parts of the database. Requires huge amounts of overfetching, unecessary coupling of client-server code, horrible consistency support, overly loose type structuring, etc.
If somebody writes a postgres version of horizon I will pay big money to use it :)
That said, the real-time aspect is what I'm really dying for, plus a company supporting a BAAS. The ease of firebase is so nice, and you can make really cool real time apps with it very fast. I'm not sure if switching to a relational model would make impose some technical limitation that json stores don't (besides obvious complexity). Postgraphql + socket.io is kind of what i'm thinking is the start
 - Dreamfactory.com
You cite this as the primary reason, but how much easier would this have been without stored procedures and especially views? I would think harder if anything.
I will say its importance is somewhat contextual though. The 3 principal contextual factors that make this reason significant are: (1) A large, complex database/model, (2) A rapidly evolving data model (such as during development or prototyping and fast business requirement changes, and (3) a clear division between back-end and front-end guys on your team. Those are the most important factors, but they are not the only ones. When these factors are significant, you really want to encapsulate DB access from client work. The client should simply declaratively say what they want, and this enforced contract is maintained by the back-end team while the back-end team is free to implement the details, presumably better than the front-end team could. For example, if a query initially involves just joining a fact to a dimension, a front-end user might think they could simply write this in their ORM. But, if the model changes and this later requires 4 or 5 other joins to the get the needed data (because the modeler or business or whatever decided it to be so), then the client code can remain as is while the back-end team can rewrite the implementation details, maintaining performance and correctness of implementation as needed.
For me, it was easier to use Redis to keep a sorted set of article IDs. I updated the scores in my application code using Redis native functions, and then got a slice of IDs for a `SELECT ... WHERE IN (ids, from, redis)`.
It's probably because I'm a rubbish database engineer, so stick with my strengths.
This doesn't make you a rubbish database engineer. Scaling databases is complex, so it makes sense to have your source of truth database do only the things it must do (consistent updates, indexing the ranking value if needed for queries) and things it's good at, and move everything else (calculating the ranking value) to another part of your system. If you can keep database load low enough to avoid sharding, that's a big win; you don't have to not grow your DB, just grow slower than Intel grows CPU performance / accessible RAM.
First off, plan and organize things. Yes, it would be nice if there were another level of abstraction, like Oracle's packages, but you can use schemas a bit more freely for namespacing. But a little time thinking instead of just coding can avoid much of the spaghetti. Many good devs will think through the rationality of their APIs, this is no different.
Second. Use a tool that lets you manage the code-bits like code. Migration tools like FlywayDB, etc. are not good for this because they sacrifice too much to the stateful nature of the database: you end up organizing things on a timeline rather than based on your structure. A better approach is like the one in Sqitch (http://sqitch.org) where you organize files according the organization of the database rather than the history of the database... very powerful approach. There are also verify steps which can incorporate testing: and testing the store procedures is easy... it's maging the data in tables across tests that can be hard and that is true regardless where your business logic lives (insofar as those tests can change data).
Third there are good testing tools available: The guy that made Sqitch also makes pgTAP (http://pgtap.org/). Well worth a look. I haven't done much myself with this one, but will likely do so soon.
This, along with validators, table indexing and foreign keys are some of the more common shortcomings i see in devs who learned on Rails.
psql> \i test.sql
When the code is polished and ready to go live, I turn it into a view or function and move it into a permanent file (views.sql, functions.sql, ddl.sql, make.sql, or whatever you want to call it). This file can be run as many times as I want, idempotently. This is because it starts off dropping all the views or functions that it later defines. You can try instead saying "CREATE OR REPLACE . . ." but PostgreSQL only lets you replace a view or function if its outer appearance stays the same (for views, the output columns can't change; for functions, the input and output can't change). So I find it simpler to just drop all of them ("DROP TABLE foo IF EXISTS...") and then create them (You usually have to drop them in the opposite order you create them, if any depend on one another). I wrap the whole file with BEGIN and COMMIT, so if there is any error the whole thing aborts. Actually I wrap it in BEGIN and ROLLBACK, and run it to see if there are any errors. Only when it's solid, polished, and error-free do I temporarily change ROLLBACK to COMMIT, run it, and immediately change it back to ROLLBACK. The file that gets checked into version control ends in ROLLBACK, for safety.
Later if I want to change something, I always do it first by editing the .sql file and then running it from psql. Then I check the changed file into version control.
For views and functions this is easy, because they don't keep their own data. You can drop and recreate them at any time. For tables, it's a little more complicated. You can't just drop and recreate them, or you'll lose all your data. So for tables I keep their definitions in a separate file. I run this file at an app's inception but seldom afterward. If I need to make a change to a table, I do it interactively, in the command-line tool ("ALTER TABLE foo ADD COLUMN blah ..."). Then I will probably edit the tables.sql file to reflect the change, just so it's in version control. But it's up to you. You can always pull the table definition from Postgres with the pg_dump command.
Would it be possible to substitute the integer count of seconds since epoch (or some other arbitrary base timestamp) and use that instead? Then you'd still have an immutable function to satisfy the index, but also you'd preserve valuable information about the relative time intervals between posts.
This observation is either profound or trivial, depending on the mindset. The hundred-fold improvements it brings (along with the immutability) are not trivial, by any measure.
This is NOT Ruby for God's sake:
SYSTEM_EPOCH = 1.day.ago.to_i
Calling a rails function a ruby function is unlikely to invalidate the entirety of the article.
I think they downvote because they are so ignorant I can't even believe it.
self.ignorance += 1
Pretty sure that's standard ruby.
Instead, I acknowledged demonstrated ignorance by cheekily posting a valid ruby class definition.