Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Optimize now free for everyone (blog.google)
164 points by gmays on Sept 4, 2018 | hide | past | favorite | 50 comments


"Optimizing" UI for session length and clicks is the same thing as finding the local maxima of bad, right?


The effects of this are easiest to see with online recipes; the highest ranking recipes are all thousand word ramblings with a recipe tacked on at the end. Google sees you spent more time on the site (i.e. wasted scrolling) and thinks you were more 'engaged.'


I noticed this too and made a plugin that saves me a lot of time; it copies the recipe into a container and puts it in a modal at the top of the page with original formatting intact.

Chrome extension: https://chrome.google.com/webstore/detail/recipe-filter/ahlc...

Source: https://github.com/sean-public/RecipeFilter

Video demo & explanation: https://youtu.be/3Xq1p10f3v4


This is awesome and will save me a lot of time. Thanks!


Wow, I hadn't made that connection! I just kept wondering why all recipe sites had devolved into fiendishly consistent kind of stupid.


> Google sees you spent more time on the site (i.e. wasted scrolling) and thinks you were more 'engaged.'

That seems specious. By that reasoning we could predict long form articles would triumph over short ones, but that doesn't bear out.

It seems just as plausible that there's a population of readers that do like those terrible rambling stories and tend to be more loyal to a site if they do, vs a large population who will just click on whatever recipes show up at the top of the search results with a reasonable sounding recipe name.

Recipe sites also rip each other off all the time, and it would be difficult to tell where the authoritative source of a recipe was. The personal story part is harder to rip off without being caught as a scraper (and may then boost your SEO prospects).


No one scrolls to the end of a long form article to get a recipe. Instead, they just close the tab and say "too long; didn't read."

Your other point are good though.


> No one scrolls to the end of a long form article to get a recipe.

I highly doubt this is true. Scrolling to the bottom of a page is a very low cost to pay for a recipe.


That was an explanation of why long articles that don't have recipes at the end aren't consistently ranked higher than short articles that don't have recipes at the end.


Ah, I see how I completely misread that. Thanks for clarifying!


You don't care about how that cookie recipe that only uses four ingredients changed their whole live and how their children just love it and the whole office goes nuts when they bring them in?

Yeah I don't either, I just wanna know how to make the damn cookie.


I guess it is up to you to define your metric to be measured and to be optimized. To define the right metric and targets is not easy and probably is business-specific, i.e., is not the same for every business.


Seriously, +1 to this. Defining key metrics is something where you should sit down with a data scientist that is familiar with your domain. In my experience, the problem isn't that people can't A/B test, they don't know what to test or how to determine if an experiment was "successful".


You are right. Another detail is to find a metric which is meaningful in long term, e.g., the one which is in line with long-term happiness of your customers.


My user experience has continually depreciated on google and most websites the last few years.


Gradient Decent aims to please the average consumer, which often is not what is in-demand in the market place.

Had Henry Ford used ML he would've invented a faster horse.


That mademe laugh, but true.


When I look at the all the flashy stuff Apple is planning on jamming into MacOS and iOS this fall, I expect the same will happen with my UX on those platforms. Who needs these features?


It depends on your goal. If you want to show more ads for longer, it's the local maxima of profit. If you want to provide something like a quick reference guide, then yes, it's terrible. Instead of optimizing for some arbitrary metric you should focus on doing whatever you're doing right.

Behavioral analytics on user interaction can help you fix a bad design but shouldn't matter that much.


Not necessarily - if I have a website that is trying to offer information to people about something, then it's better for everyone if it's more navigable/undersatandable, which session length / clicks might be a proxy for.

That said, it certainly can be.


If your users find what they need right away, session times will go down.


And perhaps total and unique visits will increase as they share it with others. At least, I wish it were that way.


If session times go down, there is a lower chance users will notice the ads, and Google wants them to see the ads.


This is a freemium product. There's an enterprise upgrade that costs about 10k per year I believe. It's nice though, I use it on my site.


Additionally a freemium product where Google can steal all your ideas because you just gave them the license.

(not necessarily true, but it has happened before with YouTube)


'Optimize' has been around for quite some time, just in case anybody thinks this is new. In fact, I believe this feature has been around for over a year, now.


Does anyone have a rule of thumb on when A/B testing becomes important for startups?

We have a few thousand visitors a month and are starting to convert, but my guess is A/B testing language and buttons would be premature optimization for us. Just curious at what point that's no longer the case.


The last heavily traffic site I worked on wouldn't perform an A/B test unless they could experiment with tens of thousands of daily active users. The experiments would last 2-3 weeks to gain statistical significance. A page usually had a 70/30 control experiment split.

The challenge is gaining statistically significant data. I think it is easier for an early stage customer to talk to their customers versus go through the time of a split test.


Some pretty easy rules of thumb, assuming you have a decent grasp on your economics. Look at it as a "low hanging fruit" optimization problem -- do you put resources into running a test (+ opportunity cost for lost sales), or into something else?

Suppose you have 10k monthly sessions with a 0.5% conversion rate (50 conversions). How many more customers would you need in order to prioritize running a test? If 55 conversions in a given month means you crush important KPIs, then that's probably worth testing -- you just need a 10% lift.*

Also keep in mind that running A/B tests (1 control, 1 treatment) is suboptimal. That tests, "does this beat what I have now?" The more important question is "what is my best option?".

OTOH, if other things like messaging and product are stable, you can test a smaller traffic site by leaving it running longer.

My rough estimate is 100 conversion events in the time the test runs. So if I have 100 conversion events in 1 month, it may make sense to run a 2-3 option + 1 control test for 1 month.

(You can also test much larger things than buttons. For startups, I like to suggest trying out positioning or value statements and seeing how visitors respond!)

* however, it'll take a long time for you to reach statistical confidence for a 10% lift in rate, with only 50 conversion events across all tests.


There are some calculators that help you figure this out. Here's one I just found with a Google search: https://www.optimizely.com/sample-size-calculator/

The idea is that to get results, you need some combination of a lot of data, or a big impact from the changes your testing. If you just change the color of the signup button, it probably won't have a major impact on the conversion rate, so you'll need a lot more data to reach a conclusion. But if you test a completely new landing page, it might have a better chance of being meaningfully different (better or worse, who knows until you test?) and so you wouldn't need as many visitors to get a result.


In order to be useful, you probably want to see AB tests reaching a conclusion in under 30 days. I'd say for a conversion rate goal, this is going to be when you have around 100k visitors a month.

There are a few variables to consider

- What goal would you like to AB test? Conversion rate is an end-of-funnel goal that needs a lot of traffic, you can use upper funnel goals like product views, add to bag etc to get quicker conclusions (not as accurate but often a good approximation)

- The stats engine/AB testing tool you are using. More simple tools might conclude quicker but in my experience they can be so inaccurate they are counter productive. Usually a long time to conclude = reliable results. I've never used Google Optimize so I'm not sure where it stands.

- How many people are being exposed to the AB test, for example is it all web traffic or just mobile?

- How much of an affect the AB test has on behavior. A button color/text change will normally take long to conclude than a feature that's really helping your users.

- How confident do you want to be before reaching a conclusion? I'd recommend looking for 95% confidence in uplift before concluding an AB test.


Google Optimize offers MAB testing, which is awesome. Personally, I don't see it as a "when do results become reliable" problem -- it's a "what is the cost of a false positive, false negative, or 'do nothing' decision?"


Multi Armed bandit tests need even more traffic, as there are more variations being tested. I think you have to be careful with false positives in AB testing - drawing conclusions too fast can nullify it's usefulness.


The different between "traditional" A/B testing and Multi-Armed Bandit has nothing to do with the number of treatments. A/B is really shorthand for A/B/n.

In an A/B test, the probability of selecting an "arm" (a treatment to show to the visitor) is equally distributed. Google Optimizer differs by adjusting the traffic distribution, sending more traffic to better performing options, what's usually called MAB approach.

The difference is that MABs will more quickly converge on the "winning" variation, but are more likely to get stuck in a local minimum. EG, if an "worse" option performs better right off the bat, the MAB might send most traffic to it. This would take a while for the algorithm to "recover".

The major advantages of the MAB approach are minimizing opportunity costs and the ability to capture seasonality because you can continuously run tests. Traditional split testing runs for a while, gets a result, and moves on. With a MAB, you can assign an "explore" budget that keeps tests running in the background to capture the seasonal/periodic change in conversions.

That comes with a cost: every visitor who doesn't see your best page is lost revenue.


If you have an easy way of rolling out tests, and enough data to reach statistical significance, IMHO it's never too early to test.


At least thinking about it is important NOW, regardless of what stage your startup is. It can be extremely difficult to try to slot a 3rd party A/B testing solution into your product (or really hard to roll your own) if your infra doesn't support it from the start. Also, hire a data scientist! (Disclaimers: I am not a data scientist, I just think everyone needs one! I have worked on an experimentation system at $BIG_COMPANY.)

I'd suggest thinking about the following BEFORE YOU RUN A SINGLE A/B TEST:

1) Key Metrics: Define these. They are the general, "I don't care what your experiment is about, these numbers are important." Every experiment you run should automatically track these metrics. You should also give the ability to define custom metrics, since an experiment that changes some random button color probably wants to look at how many people clicked the button, which is almost definitely NOT a key metric.

2) Logging Infrastructure: Make sure that you have a easy-to-use, reliable data pipeline set up for logging and processing events. Bad logging == bad experiment results. Also consider streaming vs batch processing for updating experiment results.

3) Population Management: How do your experiments segment users? Are variants calculated in realtime? Batched with some SLA for lag? Are they sticky?

4) Mutual Exclusion: People running experiments often want "their" users excluded from other experiments.

5) Guardrails: Do your experiments automatically shut off if there is a catastrophic decline in one or more key metrics? What safety measures do you have around determining if an experiment is safe/valid? How do you handle cleaning up data when there's a problem? What sorts of actions invalidate an experiement's existing results? Does your entire site break if your A/B Testing service is down for whatever reason?

6) Cleanup/Ownership: Experiments don't run forever (at least they shouldn't!). Cleaning up old features, populations, etc can a pain, especially when the people that wrote the stuff originally no longer work at the company. Make cleanup mandatory and as easy as possible.

There's a lot more, but I'm tired now. A/B testing is complex. There are lots of resources out there, though. Look for white papers on the subject, they're surprisingly approachable. Example from Microsoft: https://exp-platform.com/Documents/2017-08%20KDDMetricInterp...


I am wondering if it is based on their own Google HyperTune / Vizier [1, 2] which is modified to better deal with uncertainties or it is an absolutely independent in-house development.

[1] https://cloud.google.com/ml-engine/docs/tensorflow/using-hyp...

[2] https://ai.google/research/pubs/pub46180


"Published Mar 30, 2017"

(I'm not complaining–it's interesting and I'll give it a try. But it's not brand new)


Where is the requirement that we only see "brand-new" stuff on HN?


Well, it should probably be tagged (2017).


There is no such requirement, and I thought I was clear that I appreciated the post, and was merely trying to add the information.

Also: the headline has "now" in it. It's not the end of the world, but I've seen correction notes on NYT etc articles for far smaller matters. It's just a general principle that true information > false information.


When the title includes “now free” but it’s old news, I think it should include the year in the title


Hacker News :)


I just hope they won't kill Experiments in Google Analytics.


"Google content experiments to be deprecated"

https://github.com/dwyl/learn-google-optimize/issues/8


Yeah, I know, we'll use it until they actually kill it. Nothing really compelling in the Google Optimize offer make me want to switch, but maybe I am wrong.


Why? This is a straight up upgrade over content experiments.


I don't remember exactly. I've tried a couple of times without success. I like the way you can setup two pages and do your regular dev. process with the GA experiments. Like if you want to something server side, you can't as it's only JS on the client side with Google Optimize. So you can only A/B cosmetic. Or maybe I missed something.


So Google just killed crazyegg and optimizely?


How long will this promise be kept, I am curious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: