The admins at my work have always told us (i.e. people wanting to run data analy...

tomlock · on Dec 20, 2016

One thing to note here might be that if network is the bottleneck you'd likely be asked the opposite. In my experience if people pay attention to indexes/partitioning they can win dba's hearts, but dbas often don't explain these things because most people don't listen. So to them its easier to say "don't run complex queries".

SOLAR_FIELDS · on Dec 23, 2016

Part of the problem with people paying attention to indexes/partitioning is that there's no very easy way to display that to the end user. A person knowledgeable in SQL can query the right tables to determine this but at that point you're pretty much thinking like a DBA in the first place. If major vendors had a nice visualization of such a thing built into the interaction tools like SSMS/SQL Developer/PGAdmin3 etc. then it might be easier (these might exist, I'm just not aware of them).

So yeah, the end result is DBA's give the generic "don't run SELECT * and use a WHERE clause when you're querying large tables"

SOLAR_FIELDS · on Dec 19, 2016

Hopefully that is a dump as well. At least with Hadoop you have a little more control over resources with the right configuration. When we run Hive on our main HDFS cluster without limiting mappers and reducers it will happily bog down the entire cluster to give you what you need - but limiting the mappers, while slowing your throughput - eliminates the risk of resource hog for noncritical queries. Even then, copy of the data with dedicated resources is still preferred because without a lot of mappers and reducers your Hive queries are going to slow to a crawl.

datahead · on Dec 19, 2016

Solar_Fields and Bigger_Cheese, my apologies for not specifying the db. In this case I'm talking about a production level data warehouse specifically designed to perform complex queries and not a prod app db. Yes, I agree- best practice is to separate the two. Glad you found the article interesting.