When you look at your database as a dumb datastore, you're really selling short all of the capabilities that are in your database. PG is basically a stack in a box.
Whenever I started getting into Elixir and Phoenix and realized that the entire Elixir/Erlang stack was also basically a full stack on it's own...and that by default Phoenix wants to use PG as it's database...I may have gone a little overboard with excitement.
If you build with Elixir and PostgreSQL you've addressed almost every need that most projects can have with minimal complexity.
The upside was that when a vendor's licensing fees became prohibitive we could conceptually swap platforms. The downside was, well plenty. Like dealing with generic errors, building your own services that the platform stack already provided.
So, I love that we now have a OSS stack that folks are taking full advantage of rather than trying to abstract away the platform.
Eventually, the Java code continued giving us so much trouble that I mouthed off to our VP that I could rewrite this entire system in PHP in one night. He said, "Okay, do it."
So we did it...and because the core functionality was in stored procedures it was really easy to switch from Java to something else. The system never had a single issue the next 2 years that I was there after the move either.
Opting to not use vendor specific features just let's you make a commitment on where you want to be locked in harder.
Regardless, even when I was using Postgres as a dumb store, I was still using it to enforce uniqueness --- which is otherwise very hard to do anywhere else. Later I started to add foreign-key constraints (I know!). Slowly I found out that other things, like eligibility checking, complex calculations, and other processing, were always much shorter in SQL than in my procedural language (which had to pull the data from the database anyway to work on it). And all other things being equal, my criterion for how to do something is, "Which way is shorter?" That's usually also the fastest to run and the easiest to maintain (though I'm not talking about using one-letter names for variables and functions --- at least not always ;).
Now I'm slowly moving toward my middle language (PHP) being little more than a templating language. I think I heard that Oracle had even moved HTML formatting to the database in one of their products, but that's going a little too far.
BTW, don't move your HTML into the db :-) Separate presentation and data logic.
(We are teaching a course in Sweden on programming against PostgreSQL. It focuses on much of this as well as trying to look under the hood to deal with what happens when the abstraction fails to deliver what is expected.)
At the end, this is a longish path and one that took me personally a long time to make. But it is something that he more we share, the better off others will be I think.
1) You can (namely: PL/SQL). 2) The same situation as in SQL (the query part,
not procedural language) you have in Prolog (though SQL query is not
Turing-complete). It's a different paradigm of working with data, but it's
still a programming language.
> Instead you specify a result and let the db decide the algorithm.
The term you're looking for is declarative programming.
It wasn't an "we do this at scale" talk, but I'd love to see more experiments like it.
For the impatient: Skip to 17 minutes into the video, where he describes the previous architecture and what parts are replaced with Postgres.
Well, I will be conducting such thing in near future. From ELK stack, I never
used Logstash in the first place and used Fluentd instead (and now I'm using
a mixture of my own data forwarder and Fluentd as a hub). I'm planning mainly
to replace Elasticsearch, and probably will settle with a command line client
for reading, searching, and analyzing (I dislike writing web UIs).
All this because I'm tired of elastic.co. I can't upgrade my Elasticsearch
1.7.5 to the newest version, because then I would need to upgrade this
small(ish) 4MB Kibana 3.x to a monstrosity that weihgs more than whole
Elasticsearch engine itself for no good reason at all. And now that I'm stuck
with ES 1.x, it's only somewhat stable; it can hang up for no apparent
reasonat unpredictable intervals, sometimes three times per week, and
sometimes working with no problem for two months. And to add an insult to an
injury, processing logs with grep and awk (because I store the logs in flat
files as well as in ES) is often faster than letting ES do the job. I only
keep ES around because Kibana gives nice search interface and ES provides
a declarative query language, which is easier to use than building awk
> He even goes as far as implementing a minimal logstash equivalent (i.e. log parsing) into the database itself.
As for parsing logs, I would stay away from database. Logs should be parsed
earlier and available for machine processing as a stream of structured
messages. I have implemented such thing using Rainer Gerhards' liblognorm and
I'm very happy with the results, to the point that I derive some monitoring
metrics and was collecting inventory from logs.
...is that really a good reason to reinvent this whole solution, though? You're basically saying you're going to spend the time to replace your entire log storage/analysis system because you object to the disk size of Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).
The rest of your complaints seem to stem from not having upgraded elasticsearch, aside from possibly hitting query scenarios that continue to be slower-than-grep after the upgrade.
Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.
The system being dependency-heavy and pulling an operationally awful stack
(Node)? Yes, this alone is enough of a reason for me. And I haven't mentioned
yet other important reasons, like memory requirements and processing speed
(less than satisfactory), elasticity of processing (ES is mostly query-based
tool, and whatever pre-defined aggregations it has, it's too constrained
paradigm for processing streams of logs), and me wanting to take a shot
at log storage, because our industry actually doesn't have any open source
alternative to Elasticsearch.
> Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).
Close, but missed. It's 130MB unpacked.
> Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.
I'm fed up with the outlook of the whole thing. Here ridiculous disk space for
what the thing does, there slower-than-grep search speed, another place that
barely keeps up with the rate I'm throwing data at it (single ES instance
should not loose its breath under just hundreds of megabytes per day),
upgrade that didn't make things faster or less memory-consuming, but failed to
accept my data stream (I was ready to patch Kibana 3.x for ES 5.x, but then
I got bitten twice in surprising, undocumented ways and gave up, because
I lost my trust that it won't bite me again).
Sorry, but no, I don't see Elasticsearch as a state-of-the-art product.
I would gladly see some competition for log storage, but all our industry has
now is SaaS or paid software. I'm unhappy with this setting and that's why
I want to write my own tool.
One reason I always store full source data in the db.
Agreed, though with liblognorm rules you just shove every single variable
field into JSON field and that mostly does the job. And in the case you were
talking about logs with no matching rules, liblognorm reports all unparsed
logs, and my logdevourer sends them along the properly parsed logs, so no data
is actually omitted.
Liblognorm has only one major user: rsyslog, for which it was written, but at
some point I thought that it would be nice to have a separate daemon that only
parses logs, so I wrote logdevourer (https://github.com/korbank/logdevourer).
In the end I went with Django Rest Framework.
For example, considering the `passwd` table example https://www.postgresql.org/docs/9.6/static/ddl-rowsecurity.h..., allowing user to select their own pwhash, but no other users' pwhashes.
I find owner-dependent field-level permissions to be a fairly frequent requirement.
Listen/Notify work great for short-term job queues. For longer term ones, you have some serious difficulties on PostgreSQL which require care and attention to detail to solve. In those cases, of course, you can solve them, but they take people who know what they are doing.
Also in terms of storing images in the database, this is something that really depends on what you are doing, what your database load is, and what your memory constraints are. At least when working with Perl on the middleware, decoding and presenting the image takes several times the RAM that loading it off the filesystem does. That may not be the end of the world, but it is something to think about.
Also TOAST overhead in retrieved columns doesn't show up in EXPLAIN ANALYZE because the items never get untoasted. Again by no means a deal breaker, but something to think about.
In general, PostgreSQL can be good enough but having people know know it inside and out is important as you scale. That's probably true with any technology, however.
I've also used Elasticsearch, and I reckon that's pretty damn amazing.
Anyone wanting more in-depth information should read or watch this FTS presentation from last year. It's by some of the people who has done a lot of work on the implementation, and talks about 9.6 improvements, current problems, and things we might expect to see in version 10.
There's also some previous presentations on the same topic which are interesting. You can see the RUM index (which has faster ranking here): https://github.com/postgrespro/rum
For, say, a tag-based search system (think any forum or booru-style imageboard), is Elasticsearch completely overkill?
I realise I could probably just google it™, but it's a lot easier to understand a product's strengths when they're put in a situational context.
Solr had more functionality (largely by being able to throw more nodes at the problem), but for our purposes, PostgreSQL's fts was certainly good enough. There were a few cases where we had to be careful so the full text index would be used (because data distribution did not always match the planner's assumptions) but on the whole it worked very well.
I was using Django and rewrote the search just simply collecting a dictionary of terms that I would then explode into filter on the model.
A database of order 100k articles with something like 50 fields, about 20 of them foreign keys, bulk text fields of several to tens of KB responded within milliseconds on a 4 core 4GB machine under load.
The "weird thing" about it that I didn't understand at the time was that using a more complex query would often result in speedups. I understand it better now after this article since PG builds some sort of index on text fields.
PG is one of the infrastructure pieces that you install and then it just runs performantly without much fuss. There probably are cases where it fails horribly but in almost a decade of use I have yet to see a single operations failure.
I think there was a plan to fix this but I don't know if it happened yet because I'm still on the LTS release. And because I don't plan to use sqlite in production anyway.
I actually think SQLite in general is fairly underrated, but there are important limits.
PostgreSQL is my go-to db, but I love the ability to have the callbacks to SQL functions being in my main code. For lots of data analysis work, the way custom functions and custom aggregates work in SQLite is wonderful.
But again, a chasing hammer is not the same as a machine hammer.
My biggest concern for sqlite WAS that it is only embedded and not clients server which is a problem if I have more process trying to access it.
I solved it writing a Redis modules that embed sqlite  so you get a client server sqlite for all your relational/ACID need and redis for everything else.
It is nice to have some positive feedback once in a while.
Definitely if you have suggestions or feedback (the negative) please please share it.
Either open an issue, write me an email or keep this conversation going ;)
Out of curiosity, does anyone have a favorite article saved that does a great comparison of when to use certain databases?
I encourage everyone to challenge the conventional wisdom that using postgresql as a work queue is an anti-pattern. I've never found a queing software, out of countless options, that I was satisfied with. In concept, a queue is very simple. In practice, I've never seen the end of messy real world constraints that need to be imposed on queues.
We have somewhere around 50-100 queues in postgresql databases at MixRank, nearly all of them with millions of records or more, being hammered constantly. We get transactional consistency, automatic insertion with triggers, expression and partial indexes for prioritization and partitioning, joins, constraints and foreign keys, and everything else that comes with postgresql. It works great.
I don't want to appear too ignorant but why? I hear many claims of this but few explanations. The best I've heard is something about not being able to handle the load but I don't really buy that unless you don't need persistence/integrity.
As just one example, from a previous job: they were using SQL Alchemy in the code that interacted with the work queue, but somehow didn't realize that the DBAPI being implicitly transactional meant they'd need to manage transaction-state themselves. Consequently, they never released the implicitly opened transactions that every poll for new work created. No-one thought it was strange that the work queue system needed to be shut down weekly, so that a VACUUM FULL could reclaim the gibibytes of space that were wasted because the db couldn't clean up tuples (row version) that still appeared live to stale, abandoned transactions. Until I was all, "Uh, guys?"
The fix was obvious: disable implicit transactions. But that's just one example, if particularly egregious. I've seen and heard countless more.
But a lot of things are less about outlines than details and that is true here. If you do get it right, you get a work queue where jobs disappear properly when completed and if they fail remain in the queue to be retried.
The fact is, work queues are hard to get right on any technology. You have a ridiculous number of corner cases and failure conditions you need to account for and the question ends up coming down to what you want the default failure state to be.
That's one thing that makes this a hard problem. If you have open transactions for two weeks "Deleting this...." then autovacuum is going to be totally ineffective. So failure detection and recovery is an important (and probably the hardest) part of the problem.
PgQ looks to me like it is a solution to a message queue problem not a work queue problem.
If I understand you correctly, this handles your work queue case.
In my experience you have several critical issues:
1. What happens when a job silently fails?
2. What happens when a job takes a lot longer than expected to succeed?
If you solve the first with a timeout, the second leads to a job rerun. The best (only?) solution I have found is to have some awareness in the job queue of the fact that the job is currently being processed. In my previous work we used advisory locks for that.
Given your critiques of everything else out there (from what I gather from the rest of your comments in this thread), it seems like your identified a possible business opportunity.
What happens if someone trips over a power cord after a week of this job running? How do you detect and recover? What if, due to the size of data passed in, the job takes a week longer than expected?
This is why you cannot always reduce a work queue to a message queue.
Consider me interested. Any way I could be kept in the loop?
Not sure on timeline yet. Have not had the time to actually write the first generation of the code. Here was my initial announcement (Old, I know but I haven't committed the code yet and need to do that): http://ledgersmbdev.blogspot.se/2016/08/forthcoming-new-scal...
Wouldn't using a basic wrapper like Celery take care of most of that for you though?
Using an RDBMS as a work queue is an anti-pattern...
Like a lot of things, craftmanship depends on detail far more than it does on outlines.
The closest thing(1) was dbase/foxpro. You can actually build a full app with it. Send email from the database? Yes. Is not that wrong? Is wrong just because RDBMS (2) made it wrong, not because is actually wrong. Why is better to split in separated languages/run times/models a app than one integrated?
(1): Taking in consideration that neither Fox or any "modern" rdbms have take the relational model to it full extension.
(2): A RDBMS is a full-package with a defined role, and limited capabilities. A relational-alike language will not be a exact replica of that. Not even is demanded to implement a full-storage solution.
The biggest mistake the relational guys have commited is to think always in terms of full-databases instead of micro-database. Ironically, kdb+ (or lisp? or rebol?) could be the closest thing to the idea (where data+code are not enemies but friends).
For SQL, complex queries, and data warehousing: yes. It's an excellent database and I'm not sure why you'd pick another SQL DB unless it were a lot better on point two.
For high availability and scaling: no, absolutely not.
The problem with the latter is an arcane deployment process and arcane error messages that provide constant worry that you're doing something wrong. It's a many week engineering project to deploy HA Postgres, while HA RethinkDB takes hours -- followed by some testing for prudence... our testing revealed that it does "just work" at least at our scale. We were overjoyed.
The docs for Postgres HA and clustering are also horrible. There are like five different ways to do it and they're all in an unknown state of completion or readiness.
Of course if/when we do want complex queries and more compact storage, we will probably offload data from the RethinkDB cluster to... drum roll... a PostgreSQL database. Of course that will probably be for analytics done "offline" in the sense that if the DB goes down for a bit we are fine. HA is not needed there.
TL;DR: everything has its limitations.
I've read all sorts of hacks but I would love for someone to solve this for me!
When you do need an accurate count, phrasing the query so the results can be pulled exclusively from the table index is also usually good enough.
For example for our internal analytics/logs/metrics we use ELK and Druid but believe it or not these tools despite their purported scaling abilities are actually damn expensive. These new cloud "elastic" stuff cheat and use lots and lots of memory. For a bootstrapped solvent self-funded startup like us we do care about memory usage.
For customer analytics we use... yes Postgresql.
For counters and stream like things we don't use Redis we use Pipelinedb (Postgresql fork). For Cassandra like stuff we use Citus (Postgresql extension).
Some of our external search uses SOLR (for small fields) but Postgresql text search is used for big fields.
The only part of our platform we don't really leverage on Postgresql is the message queue and this because RabbitMQ so far has done a damn good job (that and the damn JDBC driver isn't asynchronous so LISTEN/NOTIFY isn't really useful).
This project appears to follow the JDBC spec and thus is synchronous with the exception of the Listen/Notify (of which you have to work the driver directly as there is no JDBC analog).
I have a some slightly tangential questions, which I'd love to hear people's thoughts on: How do you decide where to draw the line between what's kept and defined in the application and database? For example, how strict would you make your type definitions and constraints? Do you just accept that you'll end up duplicating some of it in both places? Also, how do you track and manage changes when you have to deal with multiple environments?
For the kind of work we do at my company - involving bulk upload, bulk inserts, bulk joins - thinking relationally is much more productive than thinking in objects, and really fast key-based lookups are far less important than really fast joins. Only configuration data lives in objects; other data is served up using a lightweight translation of the result set to JSON, with no business entity intermediary; one of the primary reasons being that the user gets to define their own schema, so any object would have a variable number of fields and types anyway.
Writing the same logic in different places is definitely a problem; you want to avoid that. We've taken to putting some logic in JRuby just to be able to use the same source in Rails and Java. For other logic, we have a Java service API endpoint that Rails can call. For maintenance purposes, it's really worthwhile not duplicating very complex logic in multiple places.
Relational databases - or rather, relational algebra, even if computed over something like Spark or Impala - is, to me, worth far more than something more suited for storing objects. Thinking in terms of relations is just more productive for efficient code than navigating object graphs.
I was reading the documents, looks like for every client request Postgres forks a new Process and uses shared memory model.
Using multi-processor threads/coroutines might be useful for scaling it further.
They're just too useful in helping to spread the load of all the IO required to return even one result.
It does have parallel scans though : http://rhaas.blogspot.co.uk/2015/11/parallel-sequential-scan...
> Back in the day
This seems somehow problematic: a DB trying to meet modern performance requirements by relying on how it was done over 12 years and 2 major kernel versions ago.
Even MySQL makes judicious use of threads for managing periodic tasks and parallel data parsing.
Per-connection processes does make some sense to me, but it seems wasteful when most connections to a DB are idle most of the time. Having to coordinate locks cross-process also seems wasteful; more syscalls and context switches than should be necessary.
Per-process vs per-thread overhead isn't that different in e.g. linux. Some things are more expensive with multiple processes (more page tables/more wasted space/increased process switch cost), others are cheaper (e.g. memory allocation, although that's getting better over the last few years).
> Having to coordinate locks cross-process also seems wasteful; more syscalls and context switches than should be necessary.
I don't think there's a meaningful difference here. We use atomic operations for the non-sleeping lock paths (which'd not be different in threads) and for sleeping locks when we need to sleep, we use semaphores for directed wakeups - but you'd need something similar for threads as well.
Really, the majority of the cost of threading is when you explicitly want to share more state, after processes have initialized. It's e.g. a lot harder to dynamically scale the size of the buffer pool up/down. It's also one of the things that made intra-query parallelism harder.
A newly developed product made today would use threads but we're not talking about a new product. Once the effort has been expended to make processes and shared memory work, especially for a product with a fairly constrained scope like a DB server, the argument becomes about whether it would be worthwhile re-writing today to target threads. So far for PG the answer is : no.
I think you are making assumptions on the basis that "threads == progress == better" that are not necessarily valid. For example do you have benchmark evidence that in-process locking would deliver significantly better performance for PG under workloads of interest vs the current XP locking scheme? It might, but I doubt it.
Here's a discussion on the topic that's only 16 years old:
Nope, we don't. You can use an external connection pooler like pgbouncer to achieve that however.
> Shared memory regions and SysV cross-process locks facilitate IPC. Back in the day this is how threads were done in user-land:
Well, that's how IPC is/was done in case of multiple processes. I'd not call that threading however, that'd imo require at least a single process space (i.e. shared virtual memory space).
The big advantage of processes is increased isolation. I.e. problems in one process are much likely to affect others, there's less locking required (e.g. in the memory allocator for local memory allocations, internally for mmaps and such). The big disadvantage is that dynamically scaling the amount of shared memory is quite ugly in multi-process models. It's hard to portably allocate shared memory after the fact and guarantee it's mapped at the same address in all processes, thereby making pointer usage hard/impossible.
I was thinking about control/data location. We had this three tier achitecturr, which is becoming 2-tier. I am seeing more control co-located with data for more optimal use. It will be useful to have coroutine like supprt, if someone integrates the whole Middleware layer inside postgres.
Mind you, you should be very cautious about loading it in a single instance as well, since it has unpleasant behaviors there, too, but you simply cannot trust it when replication is involved.
The problem is in the philosophy regarding data vs application in the two products. MySQL treats the data as king and allows any application to turn off safety measures. PostgreSQL treats data constraints as king and expects apps to honor them.
This means that the two aren't really competitors any more than either is a competitor to SQLite. MySQL is a fine choice when you want a dumb store for your data and trust the application to do all necessary checks. PostgreSQL is a lot more pedantic and that makes it better when you want multiple front-ends to the same data.
The only real problem with PostgreSQL replication is the plethora of choices surrounding it.
So with its over 10 years of history (6 of those in its current form) the built-in replication has gotten quite a bit of testing.
I'd be curious if anyone passing by can recommend a FOSS replacement/alternative for PostGIS.