Anecdotal data. I released a couple of things on product hunt. Popularity-wise one did will, the other went nowhere. Financially, it was completely the opposite. Boring stuff is very successful. I haven't seen a "popular" product hunt thing that I am willing to pay for in ages!!
People pay for pain meds, product hunt featured products are colorful vitamins.
It's a little like competing over the store check-out aisle shelves. It's unlikely it has what you actually need but there might be sometime shiny or tasty. And if you do want more next time you'll probably go shop around instead.
Hey, I kinda agree. Customers are #1 and #2 etc. I believe we have quite some happy customers.
But this is a forum managed by a VC so some people are certainly interested in this. I also thought it was interesting in the context of a side project launched on PH.
Feel free to check my Twitter (link in my bio) on how I think about and interact with customers.
It’s an important number for the CEO and CFO, but it means nothing about the business or it’s success. “Funding raised” is completely uncorrelated with how successful the company is.
This is just patently untrue. The failure rate of ventue-backed startups is 75%. The failure rate of all startups is 90%. Funding is correlated with a lower failure rate.
You said ' “Funding raised” is completely uncorrelated with how successful the company is. ' I showed you this statement is wrong, by providing you a statistic that proves they are correlated. We were not talking about any specific case, I don't know why you've opened with that. The coin flip statement is true but also completely unrelated to our discussion.
I agree with @tnolet that it's not the number one stat I want to know about any given company. But, at the very least, it does mean that they were able to convince some people who look at thousands of companies a year and then interview hundreds of them to give them $12M. That's not nothing. Granted, $12M is not a whole lot of money in VC land, but there is some signal in that data point.
All that means is that they have $12 million in funding. Assuming anything else is a logical fallacy; there is no more information available in that number.
Yes, there is, just as I said: they were able to convince someone who looks at thousands of companies a year, then interviews a handful of them to give them $12M. You don't think they prayed really hard to the money fairy and got their wish granted, do you?
No, you shouldn't read much more into that, but there factually is information behind the number.
Well no. It means they convinced someone to give them 12M. It says nothing about the credentials of the lender, or that those credentials are even relevant or accurate.
Not to mention that convincing a vc, regardless of clout, is not about objective profitability or anything correlated with success.
It’s about convincing a person that you can make money, and people are in capable of being objective
Even ignoring all the other factors, there is a disconnecting between only managing to get five (free) upvotes in one forum, and finding a group of people willing to bet 12 million on it in another.
> We consider a 2XX (Success) and 3XX (Redirection) status codes successful
I feel like this is flawed, especially considering 1/2 of the successful responses were 3XX. It's possible that they had just linked a short URL that was a redirect, but it's also possible that the product was shuttered and a redirect put in place to a replacement product, the company homepage, or even an acquiring company. I don't think there is an easy way to tell based just on the response code, and I'm not sure you could even programmatically determine it unless you had samples of what the pages looked like on launch day (maybe compare today vs the Internet Archive?).
Comments here are really surprising. I really struggle to understand Product Hunt. I've spent multiple sleepless nights scrolling through it and I couldn't find a single meaningful or useful thing. I guess if you start splashing water around the streets, you will find a few perfectly shaped puddle. But I have never stumbled upon anything that made me think "wow, this is awesome" not even "this might be useful".
When they first started a few years ago I used to visit them quite a bit, and often found interesting things, but most (all?) that I actually used ended up being discontinued a year or so later.
I don't visit the site really every anymore, but I did take a week the other week, and like you, nothing really stood out to me.
Fair warning: This is a blog post advertisement for ScrapingBee. The data is still interesting.
The most interesting chart is one of the last: Proportion of Failures over time. As expected, more recent product links are less likely to 404 or 5xx.
Going back to 2014, almost 1/3 of the featured links give a 4xx or a 5xx response. That’s a lot!
More surprising, links as recent as 2020 show a 1/4 failure rate. Those projects basically launched on PH, then shut down shortly afterward.
Moreover, this analysis can’t actually account for products that have been shuttered but still have landing pages online. It’s ultra cheap to keep a placeholder “Sorry we’re closed” page online, so I imagine a lot of these projects are shutdown but counted as “success”.
Subjectively, this matches what I’ve gathered from watching PH. Getting a PH featured product listing seems to be a badge of honor, but PH users aren’t really interested in using 99% of the products and the submitters aren’t actually interested in building them past proof of concept. Recently, the bulk of postings seem to be advertisements for paid information products or pay-to-join communities.
I find your warning a bit unfair as there are literally no CTA inside the blog content promoting our product and only 2 internal links toward other educational posts.
But anyway,
I thought about taking a random sample of pages who returns a "200". Let's say 150, and manually tagging them to find if they're "dead" or not.
And then reuse the "dead or alive but a 200" ratio for all the pages but I was afraid that I'd need to tag much more than 150 pages to have a significant statistical result.
> I find your warning a bit unfair as there are literally no CTA inside the blog content promoting our product
It’s obviously blog content designed to promote your product, hosted on the company’s product website. I don’t see how the FYI is unfair.
I added it because the content was valuable but HN can be finicky about blog posts from companies advertising their own products. Trying to get ahead of indignant dismissals.
> It’s obviously blog content designed to promote your product, hosted on the company’s product website. I don’t see how the FYI is unfair.
There's so many blog posts posted here that could fall under "content marketing" umbrella if you want to be strict. I feel like there's no problem with that if the content is valuable and people like/upvote it. After all this is a platform that is doing marketing for YC where YC companies are supposed to post their content too.
That "warning" also stuck out to me as a bit unfair as I was even looking for how it hooks into ScrapingBee (as I was curious how these scraping-aaS platforms interface with custom code) and couldn't find anything.
Yours ended up coming across as the indignant dismissal. As a community member I didn’t appreciate the warning. From the second paragraph on your comment was an interesting contribution, though. I’m surprised that many of those PoC businesses have stayed online at all, but I guess romaine are easy to renew.
For what it's worth, you could watch how quickly the confidence intervals converge as you sample the data, to see if it's worth continuing or if the variance is too high and whether you'd have to check thousands of pages by hand:
from scipy.stats import binomtest
chance_of_dead_page = binomtest(landing_page_counter["dead"], landing_page_counter["total"]).proportion_ci(confidence_level=0.90)
print(f'Chance of a dead but existing landing page (90% Confidence Interval):{chance_of_dead_page.low * 100:.2f}% to {chance_of_dead_page.high * 100:.2f}%')
> I find your warning a bit unfair as there are literally no CTA inside the blog content promoting our product and only 2 internal links toward other educational posts.
I've worked in or adjacent to the content marketing world long enough to know that a CTA is not necessary for the post to be marketing/advertising. One of the major goals of content marketing it to establish the authority of the brand. You are well aware that the raison d'etre of that post is to spread awareness of and establish the authority of ScrapingBee.
It doesn't mean the post is not interesting, useful or valuable. But that post exists fundamentally for marketing/brand purposes.
Parents warning is completely fair, especially since they immediately point out the value of the post.
I thought the same thing until I realized that dead domains are often snapped up by squatters/spammers (or just by other people who want that domain for actual reasons) so may not error when requested.
> Fair warning: This is a blog post advertisement for ScrapingBee. The data is still interesting.
Sure, but it's no different than any other blog post from a company. And framing it that way is quite disingenuous since the post pretty much only sticks to the topic and doesn't overtly promote their product.
Don't forget cases where a product's domain expired and has since be reused for something else entirely (or a product with similar goal but new vendor)
> More surprising, links as recent as 2020 show a 1/4 failure rate. Those projects basically launched on PH, then shut down shortly afterward.
This would very well fit a "fail fast" attitude with testing MVPs, wouldn't it? At least that's what I would guess. Got a great start with PH but didn't move on from there, so the domain was not renewed...
Does a great start on PH mean very much? The chances your target audience is there for most products seems very low. I would love to see this data compared to all products/startups in general, but of course that's probably difficult to do.
Bingo. IME Producthunt is lazy marketing for indie founders. Their main user base is other founders, wannabe founders and super tech literate power users who are itching to use “the next new beta thing”, who’ll be a tiny percentage of any userbase.
Granted there are cases where that market IS aligned with your product, eg if you built a low cost site-builder or low cost social media publishing platform
The most surprising thing to me is not the failures but that the devs won't even pay a few bucks a year to keep the domains online. If I spent time and effort into building a product that went viral and got a bunch of users, I'd at least leave the front page of it up indefinitely as some sort of tombstone.
Funny story: on the day that Product Hunt posted its Show HN, someone (unbeknownst to me) posted my startup on Product Hunt. It was fun to ride a little wave on top of a big wave!
My startup is still around, [1] and we posted on PH one or two other times when we launched new products. Even though we had some powerful hunters (thanks to our early presence on the site), I found it took too much time to be worthwhile for follow-on product releases. I'd be interested to know if others have had the same experience, or if they have tips for how to get a meaningful bump out of subsequent posts.
2. Which sectors are doing well - what are the trending tools.
Skimmed through PH APIs, don't think this is possible. Courtland's Indie Hackers (they have stripe verified revenue) maybe of help - a quick google resulted in this¹ result
Revenue would be near impossible to do, however we could have analyzed traffic using SimilarWeb or Ahrefs API.
We could also have analyzed the sitemap to check the last update date.
Those articles are really fun to write (I haven't written this one, I'm just the editor), but at some point you have to stop otherwise you end with a 20k words essay.
Agreed. A subset of products are "stripe verified" on Indie Hackers - should be a good enough population.
I think the parent article is interesting, thanks for your contributions. I am not saying that the same should have contained revenue, performance data - just that it would be interesting to see :)
Unrelated to the article - is it just me or is this scrapingbee product borderline nefarious? From the homepage:
> Thanks to our large proxy pool, you can bypass rate limiting website, lower the chance to get blocked and hide your bots!
> Scrapingbee helps us to retrieve information from sites that use very sophisticated mechanism to block unwanted traffic, we were struggling with those sites for some time now and I'm very glad that we found ScrapingBee.
It really depends. There are plenty of legitimate uses for scraping (for example, I've been involved with academic research that involved scraping Twitter search results), and it's only really feasible to collect the amount of data you need using scraping plus paid proxies. That being said, there are also a number of nefarious paid proxy services which offer residential IPs (read: are usually botnets).
The Twitter API has very low rate limits (from a data collection perspective). While there may be good reasons for that, these limits also preclude doing public interest research of the type we were doing (how Twitter's various search filters influence the political leanings of search results). When companies have Twitter's level of societal influence, I think it's also possible to define "legitimate use" in terms of public interest, rather than simply "users" or "site owners."
It is definitely super annoying that companies are allowed to spy on us and do all kinds of crazy things with our data, all using computers and automation and "bots" and such, but individuals are increasingly not allowed to use automation to help us out online. Seems rather one-sided. On the other hand, I get that abuse is a huge problem. I do wish at least bots operating at roughly human request rates & daily total requests were considered OK and universally allowed without risk of blocks or other difficulties leading to increased maintenance costs (so, making them less valuable).
Sometimes the scraping situation gets kinda ironic. I worked at a large eRetailer/marketplace and obviously we scraped our major competitors just as they scraped us (there are four major marketplaces here). So each company had a team to implement anti-scraping measures and defeat competitor's defences. Instead of providing an API everyone decided to spend time and money on this useless weapons race.
Absent someone breaking really far away from the pack, that's a classic example of one type of "bullshit job" called out in Graeber's book... Bullshit Jobs. Zero-sum, ever-escalating competition. Militaries are another obvious example (we'd all be better off if every country's military spending were far closer to zero—but no one country can risk lowering it unilaterally, and may even be inclined to increase theirs in response to neighbors, which sometimes gets so insanely wasteful that you see something like the London Naval Treaty or SALT come about in response) but so is a great deal of advertising and marketing activity (you have to spend more only because your competitor started spending more—end result, status quo maintained, but more money spent all around)
I wonder how anyone in IT could take Graeber seriously. One of his opinions about programming was that programmers work "bullshit jobs" for their employer and do cool open source stuff in their free time which is demonstrably false.
The presentation of that in the book, based off a message from someone in the industry, doesn't seem out of line with the overall tone and reliability-level that Graeber explicitly sets out in the beginning, which is both that the book is not rigorous science and that it's mainly concerned with considering why people's perceptions of their own jobs would be that they're bullshit.
[EDIT]
> One of his opinions about programming was that programmers work "bullshit jobs" for their employer and do cool open source stuff in their free time which is demonstrably false.
Further, I'm not even sure that's incorrect. It can both be true that most open source (that's actually used by anyone) is done by people who are paid to do it, and that most programmers have very little interesting or challenging to do at work unless they work on hobby projects—maybe open source—in their free time.
The overall letter as quoted in the book, and Graeber's commentary on it, actually makes some good points aside from all this. Things don't have to be perfect to be useful.
A lot of data I provide to services is exposed to other individuals so that the service can function. They doesn't mean that data belongs to those people or that they can feely use that data elsewhere.
Allowing unfettered scraping and repurposing of data would have a chilling effect on all types of services. For example I wouldn't necessarily want a bot to scrape my comment history on HN, doxx me, and share my identity and comments with others.
I believe whenever the “no automation/scraping/bots” clause in Ts&Cs has been test in court they have never held up. However that’s not to say a service can’t just cancel your account if you are found to be using one.
Running a site thats had a bot get stuck in a loop and suddenly x10000 times the request rate, when they go wrong it’s super annoying for the website owner. We ultimately just banned the whole AWS ip ranges.
"Nefarious" is a strong word. Courts have repeatedly ruled that scraping data that is otherwise available publicly is legal. You may not personally agree with the ethics, but there are a lot of people who do.
It is very far from a DDOS tool. Scraping can be done from a single source, one request at a time, with self imposed rate limits. Sure it can overwhelm a server, but then so can a single user opening 10 tabs.
That's not what this tool does though. It allows you to distribute your scraping to a layer of proxies. So, the only difference is whether there is an intent to do harm to the target or merely collect data... which could be a form of doing harm as well?
There are plenty of tools like this where going up to the line is much different than crosing it. There's a vast difference between driving your car to an event and driving the few extra meters into the crowd at an event. You can cut down a tree with a chainsaw or cut down a tree onto your neighbours house.
There's definetly an argument that dangerous tools should be regulated to varying degrees. If we're arguing regulations in this specific area you'd probably also be balancing it with regulations that sites can't close an account for reasonable rate automated access and that public research can have higher rates so long as they're not crippling.
The tree example is true and why I agree these things are very similar. The only significant difference is when you put it on your neighbor’s house on purpose.
I wouldn’t regulate this but If you’re introducing regulations, why not just require the source to deliver the data in a neatly packaged format? The necessity for scraping and the potential for DDOS and potentially nefarious behavior basically goes away.
Based on another comment, and the wikipedia article they linked to, it looks like the Supreme Court vacated the decision and remanded the case for further review in June 2021 (probably after this article).[1] Unfortunately there is no citation for that sentence so I'm not entirely sure.
I think that means the jury is still out, as you mentioned, but it's leaning towards scraping being legal as long as the data is publicly available. IANAL
Surely you must be joking. Alphabet is the largest web scraper in the world. They would soon go out of business if robots.txt was the only data they scraped.
It’s not a web crawler. They are all web scrapers. And Alphabet/Google sells this data and makes profits from it.
It is not like it is trying to hide the fact that it is king web scraper.
Google has gotten in trouble from various publishers for this before. It is no secret there is a double standard in big tech.
Again if you are going to arrest a web scraper, then arrest the king of all web scrapers first to make it fair.
Data wants to be free. If it is publicly accessible then it is fair game.
So, no source? Your response is unrelated to the statement at hand.
Think about it: Google has every advantage by respecting robots.txt and nothing to win by ignoring it.
Eg.
1) If a media company doesn't want to get crawled: add it in robots.txt
Then they realize their visitors drops and they'll remove it again.
Ergo: publishers sue. Because they want the advantages, but without the scraping. Which doesn't seem logical to me, since they currently give Google explicit permission to scrape content.
2) if they would sometimes leak personal documents protected by robots.txt they could have a lot of lawsuits on their hands.
Robots.txt is a simple method to not get blamed.
Ignoring robots.txt could literally be a core business liability from my POV.
---
So please, source outside of gut feeling, as requested before, would be greatly appreciated.
Google scrapes web data is my point. It is king web scraper.
Robots.txt does not fit into this argument. Im not sure why it was brought up. Google doesn’t scrape urls listed there? Ok. And so? Am I to believe that just because Google says so?
Google scrapes what it wants. It does so for its shareholders. It could care less about web standards.
Nice! I've often wondered what proportion survive. Tbh, I've launched about a dozen things on PH and it's not realistic for every product to be a success. You learn by your bruises so I'd be surprised if most founders didn't have a string of failed launches behind them.
Interesting to see the categories that had the best responses include no-code!
> there's actually proportionally less failures in Product Hunts busiest period
This is a really interesting post! I think there's a little survivorship bias. As Product Hunt grew 2015-2017, users posted old projects of theirs which were already popular and successful.
My guess would be that URLs for the categories eliminated after that period (eg. Books and Podcasts) are more likely to remain stable and available, even if the product was a flop.
ProductHunt has become a cess pool of spam and people gaming their voting system to appear as a featured product. Ryan Hoover is too busy with his web3 projects and investments to care.
People pay for pain meds, product hunt featured products are colorful vitamins.