Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Working with CPG Retail Data
27 points by NickFanion on Dec 12, 2023 | hide | past | favorite | 20 comments
I work in the consumer packaged goods (CPG) industry. Specifically the US market. Think of the sorts of items you buy on a consistent basis at a grocery, big box, club, or drug store. Everything from bagged salad to cough syrup.

Over the past several years I’ve worked at a mid-size company and moved from a basic analyst role to wearing several data hats. Mostly taking on data engineering and business intelligence tasks. The hot thing right now is Power BI reporting, so I build out data pipelines, create data models, and design Power BI reports. This has lead to a lot of career success at my company, but lately I’ve been more and more frustrated by the seemingly antiquated data practices in the CPG industry.

My company is a bit unique in that we are not a single retailer or manufacturer. We work with CPG brands and retailers across the entire country. This means we rely heavily on syndicated retail data from providers like Circana (merger between IRI and The NPD Group) and NielsenIQ (NIQ). They get retail scan data from almost every CPG retailer in the country. Except some retailers are exclusive to one platform, so you’ll never have a complete picture without both.

However, Circana and NIQ do not make it easy to extract what I consider medium-size data. Everything is a portal with data presented across various “dashboards” and rudimentary no-code report creator web apps. The minute you ask about an API or data transfer service they wonder why you would ever want to leave their platform. And when you do convince your company to purchase large data extract access or similar (why do we need that when we have “analysts” who can extract hundreds of small data pieces through the web portal?), you find out it’s incredibly fragile, inflexible, slow, and unreliable. The pricing for access is never transparent either.

In some cases, I can access a retailers’ data directly (for the manufacturers my company works with) though their web portal and reverse engineer an API. But that’s typically a limited data set and more supply chain focused.

Has anyone developed a successful data strategy in the CPG realm? Is there an opportunity here to solve these problems? Are these normal problems in tech/other industries?




Not specific to the CPG realm, but I've had some very good luck working with the OpenAPI-devtools Chrome Extension[1] (previous discussion here on hackernews[2]) to discover the underlying APIs of various sites that I want to scrape data from.

[1] https://github.com/AndrewWalsh/openapi-devtools

[2] https://news.ycombinator.com/item?id=38012032


This is a huge help, thank you. I've been using the Postman Interceptor extension, but OpenAPI-devtools looks tailor made for my use case. The hard part will still be figuring out the auth headers/calls (EDIT: the dev has added auth support since the HN post!)


There are very common problems. The data vendor wants you reliant on their platforms. Knowing exactly what you are looking at and how frequently is valuable to them. Keeping you tied to their tools and interfaces is also valuable to them. The friction is a feature, not a bug.


Part of the problem is that those 3p data providers aren’t sophisticated, but the other factor to consider is that the account managers and support workers assigned to you are less sophisticated. I work at a major retailer and one of the 3P vendors you mentioned does partner with us on providing data pipelines that we query directly. Feel free to pm me, I’m always happy to learn more about other people’s experiences in our industry


Great question. The CPG data landscape is a fragmented mess. We're actually tackling this at arena (https://arena-ai.com) to build out AI foundation models for the industry. We have some tips and potentially tools that might be useful to you. Would be great to get your input / hear your pain points if you're up for it


i can’t decipher where ai fits in your product, is it just marketing?


Accessing, extracting and storing shelf data in the CPG space is really messy and a constant wrestle.

NIQ get their data from multiple sources and also via partnerships and along with their credibility amongst executives their only moat is their data and so I can understand why they would limit access to it.

For smaller companies, using web data from omni-channel platforms is a possibility but not without challenges. Platforms have become increasingly weary and make changes every week to make the job of those extracting data more difficult.

Are these problems normal? Well, I’m happy to see your post because my job is to help companies deal with exactly such challenges which are increasingly harder to solve.

Shameless plug: my service Unwrangle dot com helps people deal with these challenges. We sync up data for keywords / brands and expose via an API.


The lock-in goals are real, but so is legacy technology, contract limitations, and regional differences. From first hand experience I can tell you that every country, often even states:provinces had enough variation in their data laws that there is no common data schema.

And it’s all built on mainframes. Nielsen is a 100 year old company that was run like a monopoly until 15 years ago, now they are owned by private money, saddled with trash systems, a massive loss in market trust, and retailers have learned to sell their data directly to CPGs.

They often can’t get you what you want because their tech is from before the modern API era, and no one who is capable wants to work for them. It might have started as clever lock in, now it is just circumstance.


I think there is a lot of opportunity in this space. It’s rare to see this many comments speaking of the pain points here.

I recently was a co-founder of a delivery app in the CPG space, and I think this space is ripe for disruption. We were running micro-fulfillment centers and staffing delivery drivers, but I wanted to pivot into a SaaS model taking on the data-plumbing between manufacturers and retailers. It absolutely amazed me that NOBODY seemed to have even half-decent API’s.

I wish people could all just get together and decide on things like a universal barcode system. You’d think things like electronic inventory transfers would be the norm, yet we were still dealing mostly in paper receipts.


A success I've had is building datasets from the electronic data interchange (EDI) transmissions we receive. EDI is an old standard, and everyone implements it slightly differently, but Stedi (https://www.stedi.com) has made it simple for me to turn EDI into JSON. Like you said though, there are still folks using paper receipts/PDFs.


Yes, I've been in precisely that situation with precisely those vendors. We would get data in some weird proprietary format that was only readable in the vendor's awful desktop application.

We ended up spending a couple months reverse engineering these files (they ended up looking like database page tables or something) to finally break out of that ecosystem and get the data in our own databases.

It was a lot of work but absolutely worth it in the end. I have a feeling just the ability to natively read and convert those files is worth a lot but it's also the kind of thing that seems like it might be hard to sell...


Luckily I haven't had to use those desktop apps in 7 years, and today everything is through their web portal + .csv exports. Export restrictions still apply, but at least it's a standard format.


would it be legal to sell


At Databoutique.com we’re trying to solve the web data accessibility problem with a marketplace which connects web data sellers and buyers. Buyers can get the data with three clicks, on S3 bucket or download from the website. It’s pre scraped, quality checked and legal compliant. If a website is not listed, you can ask to sellers to provide it. Sellers deliver data using standard data structures and make their price. Far from perfect since we launched three months ago, but working on it.


I've been a senior leader at a CPG shopper marketing and retail technology company out of Bentonville, AR, primarily supporting Walmart, but also working on Target, Amazon, and a few dozen other regional grocery stores, and can confidently tell you that you are not crazy, there is no unified solution, and if you stay in this industry you will be chasing the dream of a simpler and less painful solution forever. :P

You mention retailer web portals being the only source you can get some of the data you need and that it's very manual. Historically Walmart has had the same situation with their Retail Link application that vendors can download sales data from (this has finally changed recently with the release of Walmart Luminate). No API. Just an antiquated UI that you can create scheduled reports that export excel files. To that end, for the last 20+ years there has been a vibrant industry in Northwest Arkansas (where Walmart is based) of startups creating web crawler based systems that create, schedule, and download reports from Retail Link and then ingest them into a system that is more conducive to powering the analytics demands of vendors. This niche industry of web crawler based data and analytics companies has created a market (just taking some broad but somewhat confident guesses here) of dozens if not hundreds of millions of dollars of capital over the last 20 years. You asked if there's "an opportunity here to solve these problems", there certainly is and has been for a while, but unfortunately it's a painful path.

I feel pretty confident in stating that the "data strategy" in the CPG realm has been, is currently, and for the foreseeable future will be: persistence, resilience, and creativity. Keep trying to build something better despite the constraints, don't give up when it feels like an uphill battle, and never accept the apparent constraints of the technical environment. I've reverse engineered so many things that we would eventually turn into a product to provide value to our customers. Nothing feels better than telling someone you figured out a way around something that they were told was not possible.

Be creative. Be persistent. If I'm being completely honest, get weird. The solutions you'll need will likely not fit perfectly into best practices. You'll definitely help the company you work for today, but you'll also likely help your own career in the long run (if you choose to stay in the retail industry). It may feel like a pain in the ass today, but if you apply your persistence and knowledge to it, these challenges will be the thing that make you super valuable and versatile engineer that people want to hire in the future.


Thank you - you've described exactly what I try to do every day. This year has felt especially painful, which is why I made this post. I'll stay persistent and keep looking for the weird solutions!


I have a patent for an application I helped develop which allows CPGs to manage their product family codes and offers. I’m not sure if this is the sort of thing you are talking about or not. https://uspto.report/TM/77629828


I don’t have experience in this but it sounds like obstacles put in place precisely to stop the bulk data extraction you speak of, and force lock in.

Off that’s the case you’ll find it very difficult to change.

I’ve found in other business domains, often things that make no sense and have a fairly easy solution are that way because of lock in strategies.


I do it in Europe and often work with data exported by my clients. They have huge trouble getting consistent some quality exports because Nielsen has it down to an art to change the portal and add export restrictions. It is their business model since they often do seat licenses


www.mealme.ai has real-time retailer data. They scrape retailer websites I believe.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: