
Ask HN: Data Scientists, how to identify patterns inside oil-movement data - audiometry
Countries import oil across the world.  They have flexibility in choosing the source, often sensitive to relative prices. Hurricane Harvey shut down refinery operations in the US Gulf Coast, leaving excess oil.  India &amp; China bought most of it at discount. It replaced some from the Persian Gulf which cost relatively more.<p>I want to identify the most significant switch patterns.<p>Since 2015, I record every global journey of a ship moving oil from one location to another.   Grouped as tuples: (month&#x2F;year, originCountry, destinationCountry, totalVolume)<p>Markets are constantly changing, switching patterns that made economic sense in 2015 might not make sense today, other patterns emerge.<p>Oil markets have trend-growth, and seasonal cycles. So these switches are not zero-sum actions, where an increase in X necessarily results in a decrease in Y.  Eg. China’s refinery demand is growing, next month instead of increasing Persian Gulf imports by 100k bbls&#x2F;day,  maybe it only increases 40kbd, and instead takes an incremental 60kbd from North America.<p>The source data is very granular, so there are lots of minor, noisy movements that aren’t meaningful.<p>Some countries are semi-groupable.  Many of countries in the PG export similar types&#x2F;prices of oil.<p>For each originCountry that sends oil to that destinationCountry,  calculate rolling average, and std-deviation of the imports over last N months.  Then  calculate the z-score for each monthly (date, origin, destination) tuple.<p>I then can try to look for sets of origins where  “postitive Z-scores for (originA, destinationX) are correlated with negative Z-scores (for orginB, destinationX)”<p>These are subtle signals buried in a lot of noise. I guess there are more skillful modern data science approaches.   This is not “Big Data” in any sense – three years of essentially monthly data. I fear that limits the techniques that I can deploy.
======
PaulHoule
I think sharing the data with others would be a good start.

