I'm still waiting for the option to see the ads that I want to see. I want to see movie trailers, something that I rarely see now because I don't watch TV. I want to see new video games. I want to see new books, what sports event are going on, etc. Why not ask me? It's literally not rocket science here but billions are spent on machine learning, clicks, storing exabytes of data or more trying to figure this shit out.
Just ask me for fuck's sake, I'm more than willing to watch ads in exchange for a useful service!
With the web sites I run, I thought it would be a service to users to display ads appropriate to the content of the page. For example, if the page was about bash programming, you might see an ad for a book on bash.
But there turned out to be NO WAY to specify a category of ads to run on a site. It was all their algorithm. Hence my sites on programming would keep running ads for "Batman Returns". How utterly pointless and boring.
I had a page on the Revolutionary War, and the affiliate ads would be for travel agencies. What would have worked would be books on military history.
I eventually gave it up.
I.e. the ad should be on-topic to the specific web page, not specific to the reader. Not only would this be much better targeted, gathering profile data on the user would not be helpful.
Amen. I find myself returning to something I said 4 years ago about the limit of acceptable ads: https://news.ycombinator.com/item?id=10521930
Target the intent or interest, not the user.
A by product of these searches and this browsing is that I will often get ad's for products for women. Dresses is fairly common.
If I spend time digging around for sheer fabrics or lace I end up getting served ad's for women's undergarments. I have gotten quite a bit of side eye in office environments because of this!
The problem is that movies, travel, womenes fashion all have the most money to throw at marketing and they get the most play as they result in the biggest payout for the people running the ad's. It isn't any different than beer and cigarets being in EVERY publication that had a male leaning demographic.
Sounds like a business opportunity.
An ad network where the "user interests" are supplied by the publisher (and maybe user-agent, see below), not by the ad network. The network doesn't need cookies for the user, doesn't track the user at all - the analytics are based exclusively on the data provided by the publisher/ website, not the user.
This way, the publisher might target on the content (plus maybe known user preferences/ past activity on the site, while logged in). Sure, the publisher might also track the anonymous users & supply anonymous profiles, but at that point, why bother? Just use doubleclick like everybody currently does.
You could even imagine integrating with the user agents. If I'm interested in ads for cars, I might willingly supply the information about "what kind of ad I'm interested in". In my own "advertising profile", that I keep under my control. Sounds like having a user-controlled advertising profile is a win for everyone - both users & advertisers. I'm not quite sure why nobody does it.
Same goes for guitar magazines. Of course interviews are good, but the ads are how most guitar players learned about new pedals and other things we need (or want to need).
And ironically, computer magazine ads used to be the info source when it came to new hardware or software.
The problem is that Internet ads come with the risk of infecting your system with shit nobody wants, 20 trackers completely unrelated to the product being advertised, and the fact that they much too often rely on shady tricks like wobbling or blinking or pretending to be something else just to annoy the fuck out of you enough that you'll click in the hopes of getting rid of it.
That's why ads are such a problem. Not that we don't want ads. What we don't want are the risks and annoyances that the Internet has brought to them.
Delivering one ad that results in a confirmed purchase is much more lucrative than multiple plain page views, or even clicks. The result is twofold:
1- trying to predict who's ready to purchase something, right now.
2- trying to prove someone that's seen an ad actually made a purchase.
This is why Gmail for free made sense from day one. What better way to verify a purchase that by having access to emailed receipts. This is also why Google's been buying credit card purchase histories from Visa/MasterCard/banks etc.
Separately, there's predicting one of the big life events: college, wedding, baby, home purchase. If I'm remembering correctly, getting a jump over competitors on one of these can be worth hundreds of dollars.
There's the story of Target using purchase behavior to try and predict pregnancy, so that they could send targeted / trackable coupons to the expectant mothers. They sent one such packet to a 16 year old girl, who's parents immediately threw a fit about how inappropriate it was, and how dare they accuse their innocent little girl of being promiscuous...
Turns out not only was she pregnant, but Target knew before even the girl realized it.
The kicker: this was circa 1996.
All of this to say, targeted ads might make sense, but proven effectiveness pays more. For that, trackings pretty much required.
Amazon, please let your affiliates at least pick the category! Then use keywords on the web page to pick the product in the category, hopefully at random. Seeing the exact same ad over and over is counterproductive.
The advertisers figure that self-reported data may be faulty. In some cases it could be-- either maliciously ("I'm 92 years old and spend $600 per month on my phone bill!") or by omission ("I hadn't even thought about it, but my lease is up in 3 months and high-yielding car ads might well be useful for me")
Consumers won't be easily convinced they're being taken seriously-- if they have to go out of their way to customize ads, and then see equal or more ads than before, or non-laser-focused ads, their trust goes out the window. It's also going to be difficult to stand out in a sea of banners and say "we're the high quality ad you customized... right next to 32 click-your-state-for-mortgage-broker-lead-arbitrage banners."
I suspect there may also be a third distrust-- between ad networks and advertisers. If you ran ads for a product only on a really targeted audience, the figures may not look compelling to buyers. It might look better to say "This $10k ad campaign landed in front of one million weakly targeted eyeballs... and generated 40 sales" than to say "We spent $10k on a super-premium media blitz that hit 200 hand-selected individuals and sold 45 units."
I am a consumer and I don’t want the cognitive pollution associated with having images and sound shoved in my face. Full stop. Separately, worth mentioning, is the fact that the idea of introducing purposeful limitations deliberately to get people to pay more is also evil (I can play YouTube on my desktop and switch tabs, and on iOS for ex. it stops playback so you can buy YouTube Red).
EDIT: Offer side/bid side is ambiguous since the model is “inverted” with advertising. I’m leaving my comment as is, but its probably better to say “bid side” in this specific case.
This is a fascinating claim to me. Selling access to a service is presumably okay, but having a limited-features version available for free use is "evil"?
Rather, that you get a lesser service on a different platform unless you pay. Consumers expect the same experience regardless of platform. When that experience is free on some platforms but paid on another, it frustrates.
I consider it a bug they've added purposefully and they request each user pay for.
This is not the revenue-maximizing ad. Expected value for the company buying the ad is proportional to the product of conversion value and likelihood to convert. "Ads that I want to see" corresponds to the latter, but is completely uncorrelated to the former.
For example, advertisers would rather sell a 1% chance of generating a valuable asbestos lawsuit lead than a 100% chance of generating a $2 lunch restaurant lead.
Figuring out that things went basically okay from logs is much less intrusive. Surveys should be saved for more important things.
For a large consumer app (and actually, that's less and less reserved to the larger apps only), you can expect that every time you see a widget, or tap anywhere in the app, it is going to trigger some kind of analytics log.
The goal is not to determine your profile in order to sell you ads but to understand how people use our product (in aggregate).
Let's say you have a checkout feature. you need to add an address, payment method and tap checkout.
If a significant % of your users bail out of the checkout flow in the address step, there might be something you want to investigate.
We also do user studies, they are super useful but are harder to generalize.
And it's not just obvious ads. You can't trust online customer reviews, because there's an industry devoted to spamming them, both positively and negatively.
And you can't trust third-party reviews, because they're often disguised press releases. Or biased by payment. For example, I've been told that many sites that review VPN services basically auction their rating slots.
Some platforms have awakened to this, so they have rules or guides what to do when customer tries to blackmail free stuff with negative reviews. However, not all.
I think someday someone's going to realize just how silly the advertisement game is, and as long as the payment structure is in place, we can get a much better web experience.
For example, may of us pay a small monthly fee for Netflix. I'm sure that a small monthly fee could add up to more than what most sites make from ads.
You're not the first, or tenth, or millionth person to think of this. Hell, even just limited to HN, micropayments and general content subscriptions have been discussed for a decade. Consumers are in a way that equilibrium where they don't want to pay for web content (esp text web content), and the path to getting them to the equilibrium of paying without thinking about it (like with Netflix or power) is unclear.
It's not just theoretical: Companies like Google have also been experimenting with this for yeaaars, to diversify away from the risk (whether regulatory or technological or otherwise) of relying on ads as a primary revenue source. There are complications beyond consumer behavior, like bringing the colosally complicated ad ecosystem under a single payments system (since nobody wants to pay for a service that only removes some fraction of ads from the web).
Personal subscriptions for websites might work for those who are big enough but "big enough" is far larger than entities which probably should have been trustbusted several decades ago and it is still too fragmented for that purpose. Not helping is that they are always too greedy in price expected from ad viewer vs subscriptions - because of how few bother.
You disprove yourself by mentioning Netflix. The path is absolutely clear: Customers are willing to pay for added value that's proportionate to the cost.
The problem for publishers is they do not add any value that would justify customers paying enough for their content. Few people will pay for a newspaper subscription when there are 10 other newspapers offering 90% of the same content for free.
There are models that work, e.g. Patreon, but those usually don't scale up to, say, the Washington Post or CNN.
This isn't how equilbria work. Netflix was a superior product to piracy in many ways: no perceived legal risk, reliable access, high quality guaranteed, way better ease of use. These barriers were high enough that plenty of people didn't pirate at all and stuck with nonsense like DVDs for way too long, so the incentive path pointed smoothly towards switching to Netflix, a Pareto improvement for non-pirates and a fairly easy trade-off for pirates.
There's no such path for web content: adblockers are unquestionably legal, easy to set up, provide a better experience, and even non-users of adblockers have a trillion non-paywalled sources in an ecosystem where it's tough for strong brand loyalty to survive en masse. What advantages do you imagine a paywall option offering to people when their alternative is better in almost every respect?
> There are models that work, e.g. Patreon, but those usually don't scale up to, say, the Washington Post or CNN.
What I think is more likely--but still pretty speculative--is that an aggregator (like Apple News) could create a sufficiently large stable of publications to offer as a subscription competitive with the handful of pubs like the New York Times and Wall Street Journal that are strong enough brands to go their own way. One thing that's very unclear is whether the mass market is willing to pay the cost associated with that subscription. Probably not.
Today's evidence suggests that people are generally more open to subscriptions than pay-as-you-go for content. Music in particular has pretty much transitioned to subscription for the most part.
That being said, I think this is one of the biggest hurdles in the transition from where we are to a payments-based system.
But at this point the trust is so broken that I probably wouldn't pay for it even if it did exist. Because I'd expect whatever beacon is being used to say, "Paid up, don't serve ads" to just be used as another way to de-anonymize me by the less scrupulous advertisers. Which, as far as I can tell at this point, is pretty much all advertisers.
As I understand it, sites don't earn very much from each ad view. So to get substantial income, they need lots of traffic and ad views.
And this gets implemented through a hugely complex process of real-time data sharing and bidding. It's ~transparent if you have a fast uplink. But if you're using Tor browser, you can watch it play out, all too slowly.
So just replace that complex process with a simpler process of identifying the subscription service that the user employs, and adding the page view to their tab.
In order to match current income from ads, the cost per page view would likely be very small. Perhaps $0.01-$0.10. And there could be a mechanism for adjusting that cost based on the local cost of living of a given user. Perhaps through a parameter pushed by the subscription service, which it would obtain in some more-or-less anonymous fashion.
Unless there's regulation against advertising-based business models, nothing will change, because competitive pressure will always push towards free+ads, paid+ads, and/or free/paid+ads+data collection.
As in, I'd pay $X/month for something that does that for most sites I frequent. But I'll never sign up for 100 different sites individually.
Same thing for newspapers.
Mostly because revealed preferences.
I don't ever need to see car ads, because I live in NYC. Nor for medications for conditions <x, y, z> which I don't have. Nor for that Amazon product which I already bought.
I am so happy to spend three minutes filling out my interests profile. In fact, I already did it for Google when someone pointed out you can. But for all the many ads served by other ad networks, no dice. :(
Because users lie, intentionally and unintentionally?
It's not about what you like, it's about what you need or think you need.
Probably is because your interests change over time. And because they want to profit on what you need today. You might like cars, but you don't buy cars every day, just as me that I don't purchase guitars every week.
I think they want to profit from your immediate need, and that requires spying on you, your messages, your browsing history, etc.
They do, and the sites I visit change with them too. Visiting a site about something is as clear and unambiguous signal about my interests as you could possibly get.
The reason it's totally disregarded is, I believe, because it's simpler and more profitable to run ad networks as a market. Neither the publisher nor the ad network really care what ads are being run, they care about maximizing profits. And some advertisers (e.g. universally appealing product categories like clothes and movies) can easily outspend niche sellers.
As it is today, advertising is an act of malice, so don't expect anyone to care about the user end of the equation. You can get a better ROI by showing more user-aligned ads, but you can also get a better ROI by doubling down on the surveillance capitalism, and the latter requires less coordination.
This is why UX/usability studies are the gold standard of delivering value: put a user in front of your tool, give them a task and watch the actual actions they make to complete the task. This has been the norm for decades.
Seems to come from the same place as the belief that the Cloud magically makes everything resilient and scalable without any extra effort on your part. Just put it in the cloud, and then give your CTO a bonus for suggesting the cloud, and suddenly you don't need to worry about sysops.
Don't get me wrong, "log all the things" is a good place to start when you need to figure out what's actually worth logging - but it needs to be followed by a rigorous prune.
Otherwise your data-lake turns into a data-swamp, you collate a lot of noise that makes it harder to find signals, and people eventually end up spending a lot of time trying to figure out what's actually used, if any, when Hadoop gets full or the S3 bill gets too high.
(And perhaps explains why my spell corrector camelcased "devops")
Check its log files.
Under GDPR, IP addresses can be considered PII, so it makes sense to set up an anonymizer for nginx ip address logs. There is a great Stack Overflow answer on this: https://stackoverflow.com/questions/6477239/anonymize-ip-log...
But also there's some app hygeine involved. At least one of the recent "data breach" notifications involved not an actual leak of personal information, but unsanitized logs containing personal information that should not have been shared intra-organizationally. I forget the company that did this, but they notified as if it had been a breach even though passwords had just been logged internally.
When testing it's convenient to do stuff like
console.log('username: ', req.body.username);
console.log('password: ', req.body.password);
but it's all too easy to forget about it when you're working on a million things. So a big part of the solution is mindfulness (do I _really_ need to log this?)
And no, I don't think in most cases it's forgetting to remove the `console.log(req.body.password)`, but rather having a much wider `console.log(req)` which you didn't realize contains (or could in some other code path contain) a password. Or some log statement much deeper, 2-3 layers of abstractions away, logging some struct passing through the system, which happens to contain PII.
It definitely isn't a trivial issue with a simple solution, as some people commenting on such headlines seem to imply.
is wrong. First of all, PII is a legal definition from mostly the US. The GDPR talks about Personal Data, which is different. If any consultant is talking about GDPR and PII, he is confused, stop listening. Have fun reading the details here: https://gdpr-info.eu/art-4-gdpr/
Which doesn't matter all that much, because the next part of the line is way to broad:
Is an IP adres Personal Data? Maybe, if you can use it to track an individual. But if you arent an ISP and don't actively try to identify anybody with an IP, stop worrying.
Next question: Do you need it? In general, using them for keeping the website running is normal usage.So no need for consent. Using them for attack prevention might actually be an industry best practice, and then the GDPR requires you to keep them.
Next question: If you need them, how to keep them safe? Throw logs away after a while, encrypt backups, etc...
The GDPR has been hijacked by consulting companies to extract money from everybody, so they do their utmost best to sow paranoia with all kinds of weird urban myths. DOn't believe it. Basically, do the normal IT best practices and stop worrying.
One thing I wonder about is what you would do if, say, you have an abuser on your site that you need to ban due to behavior detected after the fact through a log file.
If one needs their IP in order to ban them, but their IP is anonymized, what do you do?
If you store anything else about the user (their firstname/lastname) and can make a relation between this and the IP (e.g. you can see that this IP went to the page myprofile.php?id=438098 at 23:10 yesterday), then you should already have somewhere where the user can see why you store their firstname/lastname. Just add "IP" to the list of data stored, for the purpose of maintaining your systems safe and accessible, warn that you'll store it for 30 days, because after the logs are purged, and then you're fine.
In GDPR terms, we have obvious legitimate interests in being able to identify repeat offenders trying to abuse our system in some way, in being able to identify recurring problems with how our systems are operating, and in tracking long term usage patterns. These interests combined with the very low risk of any adverse consequences should any of the relevant logs leak make our policy of indefinite retention in this specific instance compatible with GDPR in our view.
Incidentally, we have in fact identified repeat abusers returning to our site several years later based on their access patterns and IP addresses as recorded in logs, so there is even an objectively demonstrable long-term threat should anyone ever want to question this policy.
I work in gambling industry, and we are required by the regulations to keep ALL user information, including the KYC documents they submit, on file for a minimum of 5 years after their last activity. If you are looking for toxic data stores, this is among the worst ones there is. Limiting access to that data is crucial, and making sure it's not misused is mandatory.
It could be worse. There are domains with more demanding data retention requirements: insurance and consumer finance in particular.
> "The formula only holds for comparable users who will be using the site in fairly similar ways"
This is the article where Neilson breaks down what is being tested, and why it is statistically relevant.
Neilson is looking to solve HCI(1) and Human Factors(2) issues - and most of these are byproducts of having to have a deep (insider) understanding of a product and that bubbling up into your UI. You are going to catch a lot of errors that fit the adage "can't see the forest for the trees". Having sat through a LOT Of these tests, you will pick out user frustrations, and reasons for product abandonment that would likely be NON apparent in a log.
Your examples of US/China, TX vs CA and GRE with race and class MIGHT be relevant but it is going to depend a whole lot more on what your building. The problem is there are other means and places where these issues might manifest, and again use testing would tell you a lot.
If we were to build a VR game that used a chopstick like interface, and test it only in china, we would likely think that we had a good product. If we find out later that "this isn't selling in America" then testing in that demographic group would quickly give us the insight that people lack the muscle memory to use this intuitively. There isn't any log in the known universe that would give us that clue, and "test here" can (and likely would) be gleaned by other means.
When you get past HCI and Human factors log data can be useful, and be a contra-indicator of the results of formal testing. Given a choice between A and B in a formal setting may give you one set of results even with a large sample size, but real world behavior turns out to be very different. This is akin to people slowing down when they see a police car - but driving fast when one isn't present or kids acting differently because they know someone is watching. We aren't discussing UI and UI interactions were now discussing human behavior, and preference. I can't tell you how many times I have seen the non preferred solution be the winning one in an A/B test, but I would generally bet against what the group likes and pick the most garish solution as the winner.
These behavioral types of tests can only really be driven by logs, by people being themselves and "feeling" unmonitored, and accurate demographic (to your point) slicing and sorting. En mass people are far more predictable than they would like to believe. Were delving into something more along the lines of Asimovs Psychohistory(3) as I don't think these sorts of statistically predicable behaviors have been given a formal name.
Yes, statistical sampling is a hugely useful practice, and is frequently used, at least by those who are familiar with its power and capabilities.
Depending on what you can see, it may or may not be particularly useful. For activity logs, you are getting a bunch of relevant information, though if you stick to just sampling log records, you may miss useful information, such as paths through a site, session data, and the like.
In doing analysis of the scale and scope of usage and activity of the late and unlamented Google+, I had the opportunity to sample based on profile IDs, which Google had helpfully stashed in a set of robots.txt sitemap files, back in 2015. More recently, when seeking information on the number, size, and activity of G+ Communities (effectively: groups), I could perform a similar sampling based on the group IDs, also provided via sitemaps.
For a basic assessment of how many active users and groups there were, a small sample, as few as 100 or so IDs, selected at random, were sufficient to give a general feel. But there's a lot of variance hidden in 2 billion registered users (as of 2015), or the 8 million Communities existing as of January 2019. And for detailed measurement of the most active users and groups, a very small fraction of the total (0.1% of users, and the top 50 or so of 8 million communities, or 0.000625%), the releative sampling population wasn't the total user or group count, but that small subset, randomly distributed throughout the whole, comprising that sample of interest.
To find the very most active users and groups, in other words, you have to sample a lot of datapoints.
(Mind: if I'd had log data, they'd have fallen straight out of that. I didn't. Which is itself another lesson: in most cases you're interested in activity and not population as a primary analysis variable.)
Given my tools and methods -- requesting URLs and scraping, from a desktop system over residential broadband -- there were limits to the amount of sampling I could do. 50,000 profiles were doable in a couple of days, but a larger pull would have scaled linearlly in time. For Communities, I did a largish pull based on a minimum level of resolution I thought would be useful, based on 12,000 (again, randomly selected) Communities.
In the end I lucked out as a third party was able to provide a comprehensive dataset of all 8 million communities and summary metadata, from which I could validate my earlier sample-based methods.
But yes, working with hundreds or thousands of records, rather than millions or billions, often makes sense, is useful, and requires vastly fewer resources (compute, time, bandwidth).
For getting a rough idea of just