Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At my current job, our intranet site has lackluster performance due, in part, to limit/offset pagination. Unfortunately, the business treats the "reports" we author like glorified spreadsheets and want the ability to filter on any column and order by any column. It makes it near impossible to tune/optimize.

The legacy system we're replacing used cursor pagination in a lot of areas and was perfectly acceptable to them, but now that we're going web based - it's not. Unfortunately, really - it seems vastly superior.



“It makes it near impossible to tune/optimize.”

I recommend using elastic search or a nosql database to optimise performance. Relational databases can be slow for this use case.


Thanks, I'll have to search for some case studies on the topic to help me understand the approach. We use ElasticSearch in one small area to search for products across multiple points of metadata and I've always wondered if we were underutilizing ES in general.


What kind of NoSQL database are you thinking about? What strategy would you take with that database to optimize this problem?


This is hilarious - i am sharing my knowledge and getting downvoted for it.

Anyway, the gist of it is that you store data in denormalised documents whereby searchable columns become keys of a single entry. The storage is a secondary storage not the main data source. You write data in both - sync in your relational db, async via a queue or what works best for your infrastructure. Searches are then made against it. If you go for es you can rank results assuming filtering is done using free text search. I prefer es, the of flavour of nosql doesn't matter, but es is great for free text search.


You said in a sibling comment that denormalized analytics tables are a hack, but I don't see how this is any less of a hack. It's literally the same technique, except now you're adding substantial operational complexity with a whole extra database server (versus just extra tables in the same database). And it does not at all solve the problem of needing to figure out which fields to denormalize and which ones to leave in separate documents/collections.

And even if you do decide that it makes sense to split your analytics data into a separate database system entirely, document-oriented databases "ain't it".

I have very little experience with Elasticsearch, although I'm surprised to hear it being recommended for anything other than full text search. GP was talking about spreadsheet-style filtering, not full-text search.

But I do have a good amount of hands-on experience in production with Mongo, and I can say for sure that it is absolutely not a good option for an analytics database, and that I would much rather deal with traditional joins and filters. Even using it as a primary "application database" for a web app was a constant headache from beginning to end. I never thought I would miss MySQL so much, and I would never consider using Mongo again unless I had the very specific use case of storing a high volume of semi-structured data with no consistent schema, and even then I would convert the data to a more traditional tabular/relational format for use in a data warehouse if the business analysts or data scientists needed to work with it.


I hate mongodb with a passion. It may very well be that we are misunderstanding ops use case and making assumptions. My assumptions are: 1) many types of reports (as such many tables, es can create docs on the fly), 2) reports are made of many rows (otherwise why compare them with spreadsheets). My second assumption is that once you add pagination you can no longer ctrl f for content, you need full text search.

For a set of reports with consistent column names and values made of aggregate or static data what you are proposing works fine - since you can just increase counters or cache values as data comes in.

But for a use case where different types of reports have varying columns you can just dump everything into es documents and run basic aggregate queries. Or you can precompute data when making inserts.

Anyway, i am assuming too much about the use case, my bad. I’d have to hear more about it to defend my point.


This sounds pretty typical for "analytics" workloads, which relational databases handle just fine. Maybe by "noSQL" they just meant something with column-oriented storage? But even that seems like it might be overkill, compared to setting up a couple of denormalized "analytics tables" to avoid the cost of complicated joins.


Yeah with all due respect but hacks like these are a bit amateurish. I heard of a dude i think at intuit building their queues in a relational db because they work “just fine”. Prompted a giggle or two. Use the right tool for the task at hand, dont do clever hacks as they bite back later on.


"works just fine" might be a perfectly reasonable tradeoff if it avoids adding additional architectural complexity.


Depends what you define as complexity. What i described is trivial. Unless the userbase and or data are small.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: