My job title is Senior DevOps Engineer. I have no high school diploma and what they call "some college" (less than 2 years). I've been working as a sysadmin since the late nineties. I've done stints as a Director of Operations, and entrepreneur (trainer/consultant). But mostly ops work. I've been working remotely for the past 3 years (thanks COVID!). I live in a rural area. I try to put in about 40 hours a week, it's been a bit under recently as we're moving and it's been a distraction. The job opening was posted for under 200K but I asked for a bit over and got it. I was told my experience and enthusiasm was far above the other candidates. Hope this helps! (I majored in biochemistry when I was at uni.) I'm in Texas now.
Sounds like a job for the AWS Athena service. Put all the files into S3 and you can query across the whole collection. See https://aws.amazon.com/athena/
Not an answer to your question but let me know if you would like a hand with the performance issue. My background is in ops, and I’ve done a fair amount of Postgres administration, including performance management. I’m between things right now and would be happy to look at it free of charge if you’d like.
Thanks for the offer. The issues I have experienced are unfortunately not in the category of "we need a seasoned Postgres dev in here to look at this". More like unfixable.
For example, if a table has a fair amount of dead tuples, the planner can pick the wrong index because it doesn't understand the cost of skipping dead tuples. Other problems are just about how Postgres is designed. For example, index and table bloat have been serious issues even up to the newest versions, requiring constant REINDEX and pg_repack runs to bring performance back to normal. MVCC is the cause of much of the headaches. Just the fact that a single careless or accidental long-running transaction can prevent vacuuming from reclaiming old rows and cause serious performance degradation has made me frustrated enough to look for alternatives.
Some of these issues are things I've raised on the pgsql-general list and gotten a "yes, the planner could have done a better job here" type of response from the core team developers. In some cases it's required rearchitecting or changing queries to be "stupid", doing more fetches and joining client side to work around the fact that the planner cannot optimize.
Just to be clear, I love Postgres and have long championed it. I'm just a bit tired of fighting it. I also feel it's lagging behind in some areas.
Fair enough. I'm sure you are aware that you can safeguard against careless or accidental long-running transactions by setting "statement_timeout". Of course that wouldn't allow intentional long-running transactions, so not really a fix (although you could have a replica with a longer "statement_timeout" for long queries against that). But I get what you are saying.
Long-running transactions are an antipattern, to be sure. Whenever I find one, I try to split it into smaller, shorter-lived ones.
In some cases it's tricky because you lose transaction isolation; think of a data export, for example. To preserve snapshot consistency you have to rearchitect the data model itself to become temporally aware. Or the opposite: A big import that must be atomic has to be rearchitected so that the ingested data is not integrated into the running system until you "flip" something at the end.
These workarounds should be unnecessary — these things are what ACID transactions are for, after all. As I understand it (I could be wrong), long-running transactions are not such a big problem in other databases that use in-place updates with undo/redo logging.
Surely they do, but some of Postgres' problems are architectural, and other databases with other architectures don't suffer from them.
That said, this was not the point of my post, and I was hesitant to mention any concrete issues because I didn't want the thread to be about that, but rather the question of how Postgres today stacks up against commercial databases that are in use.
reply