Interestingly, Alexa was founded by Brewster Kahle, who founded the Internet Archive contemporaneously (both in 1996). Interesting flow of ideas between the two projects - one to figure out what is getting traffic (commercial) and one to figure out how to preserve it (non-profit).
It seems that the flow has reduced somewhat in recent months: Alexa haven't been providing crawl data to the Internet Archive since January 2021. (You can see this by looking at the "Items" graph on the Alexa Crawls collection [0])
I was digging through their entries, this is the latest scrape that I could find. Haven't successfully unpacked anything yet, but it seems to be legit [1].
Alexa is my go-to place to get a first impression on how much traffic a website gets.
I have never looked into how they get their data, I always assumed they get it from internet providers?
Is there an alternative? SimilarWeb publishes their data with a huge delay, so it is not of much use for me. From my experience, it is also less reliable.
It is crazy how valuable internet properties get closed down when they are owned by giant companies. According to Alexa, Alexa is still a top-5000 site:
I started a news aggregator that at the peak was ranked in the top ten Alexa sites in a country (was doing 100M page views / month). Alexa was the standard that people would use to judge how successful a site was.... Until people figured out how to massively game it.
For a pretty small sum, you could approach illegal download or porn sites to embed an iframe of your site or do a pop-under ad, and instantly be in the top 10. People aren't actually visiting your site, you're just getting "shadow traffic". Then you'd go tell advertisers "look, we are one of the biggest sites in the country".
We didn't play that game but our competitors did and it was really frustrating.
Anyway, the point is their rankings weren't very reliable (this was 8+ years ago, maybe they've gotten better at detecting traffic fraud).
apple.com being so high is (to me) a little counter-intuitive. microsoft.com too for the same reason. They don't seem like the sites people would be using a lot in their day-to-day lives. I guess it's counting dns lookups from devices and not necessarily "human" web page/app requests?
It's been forever since I've seen this first hand, but iirc apple.com and microsoft.com are both default homepages in Safari on macOS and Edge on Windows, respectively. That may artificially inflate the numbers a bit too.
If you’ve worked on or around an enterprise gateway team, you’ll learn pretty quickly that 1) Microsoft, Akamai, Cloudfront and Apple traffic appear to represent 95% of traffic in a given organisation, and 2) providing meaningful reports on internet usage for executives is *hard*
Somewhat over two decades ago, the #1 site in the web cache stats for the web cache between an ISP dialup pool and the internet at large was "icons.hotmail.com" (may have been a number on "icons", actually). With probably 2-3 x the hits than the first porn site.
"Am I on a captive portal?" tests might make up a decent chunk of requests, I believe both Apple and Microsoft have equivalents to http://detectportal.firefox.com/success.txt
how many iDevices and their apps/services ping back home via a *.apple.com domain? In other words, is it possible that the traffic isn't from people visiting the website in a browser?
I would probably have made the same off-the-cuff comment before I signed up.
You can make the same comment about YouTube or Twitter, but it all depends on what content you ingest. TikTok's algorithm actually serves me incredible content that is both educational and entertaining and I thoroughly enjoy using the app.
(And I don't consider myself some sort of floozy - I've been using the Net since before the first commercial web sites appeared)
I've never looked at TikTok but I look every once in a while to a user comment section of some pretty reputable world / local news sites. This alone leaves the impression that the world populace is but a bunch of dumb indoctrinated rabid racists and haters.
Also interesting doubleverify.com is ranked 14th, the first I did not hear of.
Quickly looked them up, some sort of anti-fraud company, working with TikTok apparently. There website doesn't load for me, I'm wondering if they are target of DDOS attack which might actually move them up the rankings haha
DoubleVerify is an adfraud and brand safety tracker. Basically, they ensure that your ads are displayed to people and not on questionable content if you dont want to. Most agencies and large direct advertisers add this 3rd party to ensure actual humans are seeing the ads, and thus the advertiser is paying for “good” eyeballs instead of “bad”. It’s essentially a 3rd party checking that the ad network doesn’t do anything naughty, which some sites have done in the past.
I've notice some non-native English speakers use "double verify" where native speakers would use "double check" or "verify". It could be a play on that.
Or they had a meeting where the founders asked "what domain sounds twice as good as verify.com?"
> Alexa is my go-to place [...] I have never looked into how they get their data
Right, this is a problem with all sorts of data sources that provide numbers (and use lots of SEO) but don't talk much about their methodology. CelebrityNetWorth is another example of this.
I've come to assume that celebrity net worth sites are mostly just made up numbers. Sometimes you can look up the payout for some specific jobs, but not all, and most of the time, they seem to just ballpark a guess based on that.
> I've come to assume that celebrity net worth sites are mostly just made up numbers.
That’s how I increased my net worth to 100 million USD. Now I’m using rolling out an unstable coin trading under the symbol UCCT (UCrayCrayToken) backed by my CelebrityNetWorth.com value. I’m going to burn half the tokens soon, so now is a great time to buy tokens from me before they double in value.
> Alexa is my go-to place to get a first impression on how much traffic a website gets.
Alexa hasn't been a reliable source of traffic data for many years. It's gotten worse as mobile devices, private browsers, VPNs, and tight-fisted companies (like Facebook) have become more widespread.
If you own a high-traffic site and check Alexa, it's not even close. One of my sites wasn't in the order of magnitude.
Tranco list [1] is considered the most accurate source for relative site ranking by traffic. It is a result of triangulation of data from several sources (one of them is Alexa) and it what we use at Kagi Search for domain information [2]
Yeah, it was also really handy for figuring out how to attack various web properties, as they frequently indexed administrative pages that were secured through nothing but obscurity, as the toolbar was most popular with amateur (and (im)professional) webmasters.
It was always biased, and it's only gotten worse over time as browser toolbars have ceased to be a thing and data collection in browser extensions has been heavily discouraged. I have to wonder if that's one of the reasons they're phasing the product out.
Handwavey "proprietary methodology" using data from "millions of Internet users using one of many different browser extensions" and "direct sources in the form of sites that have chosen to install the Alexa script."[0]
Google Analytics would only work on sites that run Google Analytics, which would exclude sites run by the other big tech companies. Alexa worked because their toolbar addon would record every site their users went to, regardless of what was running on the site.
I thought they had a browser toolbar/extension which they use to collect data from a very very small subset of internet users, which is probably incredibly biased to a certain audience. (e.g. boomers who don't know how to not install random toolbars when downloading stuff).
It's sort of wild to think that Amazon purchased Alexa.com in 1999 for ~$250M in stock, and that stock would be worth more than $7B today, if my math is correct.
Bought a website ranking company, used the company's name as the name of a consumer electronics assistant, and then shuttered the company 22 years later to (presumably) be able to use the domain name for more consumer electronics.
In essense, what the original Alexa did seems to have been internalized into Amazon ads (or their b2b analytics division), which is coincidently the fastest growing part of Amazon.
Wonder why they used the Alexa name for their home assistant.
This just struck me, and I don't know if it's true, but a-lex could be a play on Greek and Latin for "not" and "written." Which is a pretty good name for a voice-based input system.
You have the order of creation wrong. AWS Lex, Polly, and Transcribe began as forks of Alexa's NLU, TTS (which was an aquisition called Ivona), and ASR components.
>Early on, the team realized they needed a "wake word" that would make the device start listening. The word would need to have three syllables, a "distinct combination of phonemes" so as not to unnecessarily rouse the device, and an easily marketable name, like Apple's Siri.
I'd say top 100 for about the last 25 years makes it pretty common. What surprises me a bit is that it didn't fall off a cliff after Amazon's Alexa came out.
Yeah I wish I could change the wakeword. I moved to a homepod mini from echo dots for more privacy but it's annoying but being able to change it. I'd prefer one that's one word Iike.
Also bad that it won't check your email etc without an iPhone.. Not very happy with it in general but I love the yellow colour and I trust Apple a bit more for privacy.
One thing that is great is that it matches your own volume. If I ask a question quietly at night Siri responds in a very low volume whereas the echo dot just booms like it's daytime.
My Google home accepts being called a variety of things that aren't Google, possibly in order to accept different accents or slurring, possibly just because it's not very strict with exact sounds being used. I imagine this is actually a good thing if you're half asleep talking into the pillow, have an uncommon accent, etc.
I know it's not perfect, but if you use Amazon's Alexa and know an Alexa you can change the wake phrase to a few others (iirc "echo" and "amazon" are among the choices).
Our wake word is "amazon", lest we forget a major corporation is listening in to our conversations about who left the fridge open and whether we need to wear a raincoat today.
It’s not just a question of being a friend or family members name name. It’s also easy for a YouTube video etc to accidentally include say “Alexia turn off the lights” as part of seemingly innocent dialog that nobody building it thinks is going to cause a problem.
It's been literally years since someone saying Alexa on a TV show or elsewhere have triggered my devices. Occasionally I've seen them switch to "listen", but they've gotten good at not reacting.
I'm assuming part of it is that it's gotten very good at recognising who is speaking. E.g. my son and I tested the limits of it a while back by changing voices (it still correctly identified my son when he tried to change his voice), mixing and matching (I'd say "Alexa" and he would say "who am I?" and vice versa; Alexa would recognise whomever said "Alexa" irrespective of who spoke what followed). So it'll usually know if whomever is saying "Alexa" is a known member of the household, which would seem like a good indicator to increase the threshold for how clearly the phrase is spoken before activating.
I'm pretty sure this is exactly how Google does it (only matching voices on the keyword); my Google home will ignore anyone else saying "ok Google", however if someone's yelling at me from another room or talking on the phone in the hallway, it's impossible to get anything useful out of it; it'll activate on the keyword and start processing whatever the first sounds it hears (dinner's ready, ok, ok, ok; I'm leaving now; you've got a phone call; etc). Before I have a chance to tell it to pause my music so I can hear the person yelling, it responds to them, not me. Before I can ask for the weather, it responds to someone walking by on the phone that happens to say something just outside my door at just the wrong time (this happens often when you live with people working from home)
I'm guessing there's something about how they're doing recognition of who is speaking that would make it hard to scale the cloud based speech recognition. Though I also notice that Alexa can answer more questions with the network down than it used to. Early on it was pretty much only the wake words, and then it'd given an error no matter what. Now it "listens" and will answer some requests to some degree without a network connection (e.g. if you ask to set an alarm it'll tell you the network is down and it can't set new alarms, but that it will still alert you, so it'd seem the full smarts of parsing an alarm request goes to the cloud, but it recognises at least enough to know you've asked about an alarm)
The nice thing about "OK Google" was that it sounded a lot like "Cocaine Poodle", so you could imagine your personal assistant was a hyperactive creature of limited intelligence. Which honestly, sets expectations appropriately.
Except that when my business partner and I are talking to each other about Alexa, we have to refer to it indirectly, e.g. "the A-lady", to avoid triggering his Alexa devices (I don't normally have one active).
If our Google Home triggers accidentally its usually easy to think back on what we said and find the offending phrase. Siri, on the other hand, seems to trigger several times a day without any explanation.
My watch’s Siri seems to trigger when washing my hands, or sometimes when I am in the kitchen and somebody else opens the tap.
I was on a zoom call the other day, both of us using speaker (no headsets) and something trigger both our watches to say, “sorry, I didn’t get that”, in each of our languages. At least we both laughed.
Mine too, but I think you’re probably holding down the crown in the physical action of the washing. Your hands are bent in unusual ways and the back of your hand pushes up against the crown.
I don’t think it’s responding to a vocal ‘Hey, Siri!’ in this case.
According to my former manager who worked on the Alexa team at one point, it was a name with a high true-positive rate for their voice recognition system.
Did they just have extreme foresight about wanting the domain alexa.com or were they genuinely interested in alexa.com as a product? They kept it around for 22 years after all but I can't see how what alexa.com used to do has anything to do with what amazon is doing.
They bought it for data gathering and market/competitor research. I was with Amazon at the time. Was great for keeping track of what products were drawing most traction on eBay (Longaberger baskets and Beanie babies, says my vague memory)
Other companies do that, too - I think it's due to trademark law and how it's much harder to establish a new name than reuse an existing one. Cortana comes to mind.
According to the README in the parent directory, those were downloaded from archive.org. That's brilliant -- it didn't occur to me to check the Wayback Machine for a zip file, but it's totally there: https://web.archive.org/web/*/http://s3.amazonaws.com/alexa-...
A huge piece of the internet is going away: back in the early 2000s I would use Alexa a lot it was a gold mine of information - to the point where some people bragged about their rank on Alexa - that was hilarious, like the modern "twitter followers" / "github stars"
I worked for amzn for over a decade until 5 years ago... nobody ever talked about Alexa.com (I'm sure there was a department who did, but overall, it just never came up).
Tranco[0] was already mentioned by another comment, but I recommend reading their paper[1] to see how these lists are created, and how they can be manipulated (paper covers Alexa, Cisco Umbrella, Majestic Million, and Quantcast).
From the introduction summary: "e.g. only one HTTP request suffices to enter the widely used Alexa top million. We empirically validate that reaching a rank as good as 28 798 is easily achieved."
> No effect, I believe, since Alexa haven't been providing crawl data to the Internet Archive since January 2021.
Were they still providing value to the Wayback Machine at that point? Has their cessation had a significant impact on the Wayback Machine's crawling ability?
It was a bad idea 25 years ago, and still its a bad idea. I will never forget the day our company fired two brilliant marketing guys just because the Alexa ranking of the site dropped 100 points.
On a logical level, public rankings are almost always a bad idea because they incentivize hardcore gaming of the system. On an emotional level though, they are a fantastic idea because we're always looking for simple heuristics to make quick judgements by
I had some friends who worked at Alexa. I recall they had these bare metal servers down in the basement in these old barracks buildings in the Presidio. I recall being impressed that the whole thing ran on a 1GbE connection. Also was my first introduction to companies that had lots of perks like free food, sitting on Yoga Balls, etc…. I remember they had this raging argument about milk as a condiment versus milk as a beverage. Seemed like a dream to me at the time.
I don't understand how apple.com can rank over youtube.com? In what way is apple a daily service that people must access? I guess some apple service that I don't use like music?
Probably apple telemetry. At least some of the phoning home is done using apple.com as a domain so it can add up very quickly. If anything I'm surprised the top domains aren't just telemetry endpoints, but I guess telemetry is a good use case for bare IPs so it doesn't show up as much on DNS analytics.
Cloudflare only has access to DNS requests made to their resolvers, not the amount of bandwidth utilized (which would certainly push Netflix and YouTube up) or the number of requests made. They also have some vaguely defined fudge factors to their rankings.
To take a guess at why Apple is so high, I think bundling all of these together might help their rank:
- App store downloads
- iOS updates
- Apple TV+ streaming
- Services that iOS / tvOS / macOS devices utilize
- Apple Music / iTunes downloads
- iCloud (including Private Relay and related traffic)
Sounds like those would provide rather biased samples because changing your DNS server from whatever you get from your ISP via DHCP is something that approximately no normal person would ever do.
It used to be people with the Alexa tool bar were measured to produce a ranking. I wonder how they have been doing it more recently and how accurate it actually was now?
Toolbar lost traction post IE6 which is where their accurate website traffic reports came from. That was a long time ago and it’s been a joke since then. Took way too long to retire this.
They get data from Alexa Toolbar to get estimated metrics and for certificated insights you have to pay for the service and install a script or plugin in case you have WordPress. This is so unfortunate and sad... This is such a good tool, what will you guys be using instead of alexa.com, similar web? semrush? moz? none of them are as good as our beloved alexa internet.
I have an ex boss who at some time told me he used to look up names of ex employees to verify that none of them earned more after they left that company
He paid decently, true, but givrn the way things developed (he had a couple of seriously toxic senior engineers) I'm happy to say that I went to 20-40% (depending on how you look at bonuses, in Norway that is a lot anyway) higher salary in my next job and have kept getting raises since :-)
Now the two problems have departed, he wanted me back twice and I wanted to go back but he couldn't afford my current price.
Sad for both parts.
Edit: in Norway taxable salary and taxable wealth used to be publicly available.
Previously, up until 2010-ish?, newspapers etc. could ask for the public tax records from the government and make them searchable.
Now that's not possible, but they are free to list the 10-50-100 richest people in each municipality, the sports people earning the most, the business mean with the highest wealth or lowest tax etc.
Everyone can still look up individuals, though, but they have to log in to a government portal to do it, _and_ the individual being looked up will be notified about it, and who did it.
Looking up random strangers seems like a fun hobby. "Who is this person, and why is she looking me up? Is it someone I worked with? Someone I should remember? Am I such a terrible friend that I forgot about her completely?"
In Finland the system is similar, but recently changed in a somewhat entertaining way. The tax office used to publish a summary list of highest earning individuals for media's benefit. Then some politicians got involved and demanded that you should be able to deny being on that list. So now they compile a list of people who have denied the tax office from listing them.
Turns out there's no legal grounds for keeping the other list secret, so the media gets a copy of that list through some sort of transparency request. They then ask the tax office for the income information for each of these people. So now the yearly "jealousy lists" are published with all the names just as before, but some names have an asterisk next to them, pinpointing that these people didn't want to be on the list.
Before: anything. Everybody did it if they had the slightest reason if my observations were correct. Just like I remember Norwegians taking a sauna (those who had) with friends or family or showering before swimming or after exercise at school, nobody thought about the fact that they were naked between peers it seemed.
Today every kid complains about showering at school and everyone are very secretive about their income records.
This is just observation, not judgement. I think there are good reasons on both sides of both questions even if I am conservative as few (not American "conservative", I just think it is smart to change society slowly and thoughtfully).
Yes that was my point. When it was publicly accessible for everyone there wasn't a problem, but now that the information has to be explicitly requested (and the subject is alerted and provided the requestor's details) I was curious as to whether there were still any socially-acceptable reasons to request that info.
Looking by the number of replies to this it possibly could be a viable side project with a few added premium features like notifications for downtime, data breach reports, etc etc...
Wow this is something where Amazon wants to kill a service for no apparent reason. Maybe to use the domain for their Alexa AI instead? Otherwise to restrict information on the internet.
I imagine today is a great day to be working at SimilarWeb. When I did SEO related research, these two were my go to for determining how popular a website was.
Such a bummer. Alexa played a key role in triggering my interest in programming and understanding data. I remember in 2007ish a relative showing me his blog with great pride that his blog was ranking in the top 20,000. It was amazing to me because if you search some specific terms on google and click next page enough times you can actually come across his website.
Wow. That's a real bummer -- an end of an era. I've been relying on Alexa (as part of various browser extensions over the years) to get a sense of a popularity of a given website for years. Not sure what's the good alternative. If you're aware of some in form of a Chrome extension, let me know.
Alexa is the worst tool ever and I'm glad it is closed now. According to their "ranking", my website is hundreds of times smaller than it is, just because my visitors don't install their cr a ppy browser plugin. Even random numbers will provide much better ranking than this garbage.
Why is Amazon retiring Alexa as opposed to spinning it off and selling? It must have some value. Google is great for shutting projects that must be worth something spun off, even at an early stage.
Is there a general reason tech companies retire over selling?
opcode QUERY
rcode NOERROR
flags QR RD RA
;QUESTION
alexa.com. IN A
;ANSWER
alexa.com. 59 IN A 13.249.109.63
alexa.com. 59 IN A 13.249.109.113
alexa.com. 59 IN A 13.249.109.43
https://en.wikipedia.org/wiki/Brewster_Kahle