Hacker News new | past | comments | ask | show | jobs | submit login
AWS US East is experiencing high error rates on several services
276 points by oliverfriedmann on Sept 20, 2015 | hide | past | favorite | 170 comments
The services CloudWatch, SES, SNS, SQS, SWS, AutoScale, Cloud Formation, Directory Service, Key Mgmt and Lambda experience very high error rates for about 3 hours now.

Dynamo DB is throttling API access and seems to be having issues with the management of meta data.




We were in the middle of a large infrastructure change starting at 4:30am this morning, including taking our application offline. I'm very thankful that we did dry runs along with timing how long certain operations like RDS restores should take and planned for abort steps in case something goes wrong.

We noticed that RDS and ElastiCache backup and restores were taking much longer than expected, and once the first set of errors about Dynamo DB came in we decided to abort and try it again at a different date. An hour later we got notifications that RDS was having issues as well. I'm disappointed that it takes so long to update the AWS status page when things aren't working properly.


similar story here... They status.aws page has serious delays


The main issue appears to be DynamoDB

Here's a copy from the status page.

3:00 AM PDT We are investigating increased error rates for API requests in the US-EAST-1 Region.

3:26 AM PDT We are continuing to see increased error rates for all API calls in DynamoDB in US-East-1. We are actively working on resolving the issue.

4:05 AM PDT We have identified the source of the issue. We are working on the recovery.

4:41 AM PDT We continue to work towards recovery of the issue causing increased error rates for the DynamoDB APIs in the US-EAST-1 Region.

4:52 AM PDT We want to give you more information about what is happening. The root cause began with a portion of our metadata service within DynamoDB. This is an internal sub-service which manages table and partition information. Our recovery efforts are now focused on restoring metadata operations. We will be throttling APIs as we work on recovery.

5:22 AM PDT We can confirm that we have now throttled APIs as we continue to work on recovery.

5:42 AM PDT We are seeing increasing stability in the metadata service and continue to work towards a point where we can begin removing throttles.


6:19 AM PDT The metadata service is now stable and we are actively working on removing throttles.

7:12 AM PDT We continue to work on removing throttles and restoring API availability but are proceeding cautiously.

7:22 AM PDT We are continuing to remove throttles and enable traffic progressively.

7:40 AM PDT We continue to remove throttles and are starting to see recovery.

7:50 AM PDT We continue to see recovery of read and write operations and continue to work on restoring all other operations.

8:16 AM PDT We are seeing significant recovery of read and write operations and continue to work on restoring all other operations.


9:12 AM PDT Between 2:13 AM and 8:15 AM PDT we experienced increased error rates for API requests in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.


This is manifesting itself as downtime for a lot of companies, including Heroku: https://status.heroku.com

If you want alerts on this sort of thing, my side project StatusGator https://statusgator.io will alert you when services post downtime on their status pages. My dashboard blew up this morning with a ton of red and yellow as soon as Amazon started flaking.

Edit: I suppose it's time to invest in a multi-region setup. Since StatusGator is hosted on Heroku in the US-East region, it is in theory affected by this problem though so far is still up.


From Heroku Status Page:

> Our service provider is still working towards resolution of this issue. We will update when we have news, or in 1 hour.

I wonder why they don't tell that AWS is their service provider. Is it wrong to make the information less obscure?


> I wonder why they don't tell that AWS is their service provider.

It's because Heroku's choice of vendors shouldn't matter to their customers. They see it as an implementation detail, and their responsibility to manage.

So I don't think that's an obfuscation. The people I know at Heroku all have an attitude of, "The buck stops here."


That's just stupid. Heroku rarely gives a proper technical explanation of their outages and under reports the length and severity of them drastically.

I assume this is to maintain their SLA. We really need independent third parties to record uptime for SLAs instead of trusting hosts to do it themselves.

This outage may be the last straw with Heroku for me. They've also previously stated years ago that they would end their dependance on AWS East and yet today shows that obviously hasn't happened.


Not that i think there's anything wrong with heroku's messaging here, but "we're down because our service provider is having issues" isn't really a "the buck stops here" sort of update.


Because it is kind of awkward PR for Salesforce: come build on our platform! Oh by the way we haven't even moved Heroku, a company that we bought 4 years ago, over to it yet.


Why move it if it works fine as-is? I think Waze was bought by Google in 2013 and AFAIK, they're also still in AWS.


The PR is still awkward: "come use Google Cloud Services!" while it's not good/easy enough for your acquisitions to migrate to it from AWS.


Even Amazon.com isn't fully reliant on AWS. They're not using Route 53 for their DNS (or even for their backup DNS).


They will never be able to move Heroku of AWS. Their customers depend on being on the same cloud as other service providers


The location of instances could be an option. I imagine if their locally-hosted instances were a bit cheaper they would be the more popular option.


Yes, as it's shifting the blame away from their choice - which was to use AWS.


I expect the reason is because Heroku could switch to a new provider in the future and it would be a pain to always update every reference to their provider.

Parent seems to imply there's something wrong with choosing AWS. There is not. (forgive me if I mistook the tone)


You did; there is nothing wrong - but it's a choice. I.e. your site being down is your fault ultimately, not AWS's - you choose AWS (instead of many other choices OR making something with multiple, etc).

Blaming it upstream is hiding passing the buck on your decision.


Indeed. If your site is down, it's your fault - no-one elses. Yes it may be down because someone downstream is down, but ultimately it's your choice where and how you host.


Interesting that the outage in us-east-1 is causing huge issues for Heroku in eu-west-1 too - obviously a region for Heroku isn't self sufficient.


Where do you see that it's causing issues for Heroku in eu-west-1? I have a lot of dynos in that region and I don't see any issues


Usually Heroku mentions a specific region in any status updates that don't affect all regions. For example, a status update 11 days ago stated "We're currently seeing increased routing latency in the US region. Investigation is underway."

AFAIK this outage only affected dyno restarting (which may have been triggered by a number of reasons) and creation of new dynos. Perhaps your EU dynos were lucky enough to not have done either of these things during the outage?


"Until this incident is resolved, you might be unable to open new support tickets with us."

Maybe they should host the ticket system on a separate provider?


This seems like another reason to not rely on Amazon-specific services, other than the obvious vendor lock-in.

At least in the event of an instance outage you could conceivably migrate off Amazon to another VPS provider. No one using DynamoDB has an alternative.


How would this have helped? In the event of an AWS outage, you have all-hands of one of the biggest tech companies in the world working to fix your infrastructure. This outage lasted a few hours, wouldn't it take you that long to switch providers anyway? If you don't abdicate responsibility for maintaining the infrastructure, you lose a lot of the operational benefits of using a cloud service...in general, I think it makes a lot more sense to embrace whatever cloud provider you're using (provided you made a good choice) so that you can focus your efforts on your product and not on maintaining infrastructure.


> you have all-hands of one of the biggest tech companies in the world working to fix your infrastructure

No, they are fixing their infrastructure. The point here is that all single provider systems are destined for periodic failure. Not relying on one single provider is, in theory, a service's means of providing higher reliability. This is the general argument for regions and availability zones on Amazon, but that relies on trusting there is no single point of failure with the system (i.e. inside Amazon).

I've worked for several companies who run the majority of their services on AWS, yet maintain functional systems on other providers in the case of an AWS outage.


> wouldn't it take you that long to switch providers anyway?

It might, or might not, depends how you built it.

> you have all-hands of one of the biggest tech companies in the world working to fix your infrastructure.

And it still was out for hours... You might see it as a great success, but you can also interpret it as "what good is it, even if with so many people behind the scenes they still failed for hours...".

The problem is that because that services ends up being embedded in so many services and products, half the internet ends being down. Even Amazon dogfoods their own stuff so services rely on each other. Dynamo is down, maybe SQS will be down or analytics as well.


This really is the first major failure of DynamoDB since when I started to use it, when it was in beta over 4 years ago. Not sure many companies would be able to run a multi-million QPS scale database with such a great availability record. In any case we just moved traffic away from us-east-1 and into us-west-2 and all is fine still so you should design at least your critical systems for regional failures.


Very good point. I wrote a blog post on this topic to try and characterize different types of lock-in. http://www.eightypercent.net/post/types-of-lock-in.html

This was as also part of the strategy when we decided to open source kubernetes. Having an open alternative made the commercial offering (Google Container Engine -- GKE) much stronger because of the reduced dev lock-in.


I don't disagree that you would want to have a rough idea of what migrating off of DynamoDB would require but wouldn't the easier step be using redundancy across regions first? Most of the sites which have suffered downtime due to AWS outages have been operating in only a single region (or even AZ!) and adding the extra level isolation is usually going to be a lot easier than dealing with multiple vendors or having to maintain more of your infrastructure directly.


Running multi-region services has associated costs. It's also non-trivial engineering effort to convert most services to a multi-region setup. It's definitely worth the investment though. Another option is to run multi-region on the same provider (like AWS) without using proprietary services like DynamoDb but maybe something similar like Cassandra.


There are many high profile companies down (airbnb, IMDb, tinder, ...) so apparently this is not so straightforward.


I don't see airbnb or IMDB being down. If there's some advice on what small time apps (like mine) can do to get thing up again sooner, please let me know.


Explains the weird 500's i was getting when trying to do a review this morning.


In this case AWS sign in is down (perhaps it depends on Dynamo), so redundancy across regions can only help you if fail over is fully automatic which has it's own problems (split brain syndrome...)


One of the first things Amazon tell you about AWS is the API is king - the console is hosted in us-east-1 but the API is hosted in each region independently.

You don't need automatic failover, but you do need your own scripts to talk to the AWS API. The whole point of AWS is automation.


AWS sign-in appears to be working but in general it's true that you need some sort of automatic failover. However, isn't that true for every option other than accepting downtime? It doesn't seem like an AWS-specific challenge other except at the implementation level.


I don't much use it anymore but this is something I always liked about AppEngine: baked into the pricing is automatic replication over three regions.

Serious AWS web apps are also distributed across multiple regions/availability zones, but many companies don't do that. It would be good for hosting services like Heroku, which I use, would replicate across zones also.

To be fare to Amazon, they tell you you architect and built across availability zones.


dynamo db is affected in the entire region, however, not just a single AZ.


That's why parent post mentioned multiple regions. No other AWS region was affected...


DynamoDB now support cross-region replication [0] so you can build more resilient applications with it

[0] http://docs.aws.amazon.com/amazondynamodb/latest/developergu...


I don't think cross-region replication would've helped in this case:

"The replica tables are intended to serve as read-only copies of the data; however, it is possible to write data to a replica table. If you write data to a replica, those changes will not be propagated to the master, or to any other replicas."


You can relatively trivially build multi-master cross-region replication in DynamoDB by using kinesis and writing to kinesis instead of DynamoDB directly. On the consuming end of Kinesis you then fan out to all the DynamoDB (or whatever other database you want to use) regions[1]. Admittedly this only works with some relatively relaxed constraints on the latency you can see, intra region latency can go up to 1 second although rarely, while cross-region is around 3 seconds. An important role is also played by the structure of your objects and how accepting they are of concurrent updates coming from different regions (which is the main reason why the default replication in DynamoDB is not multi-master).

[1]: http://tech.adroll.com/blog/data/2015/06/26/kinesis.html


You could at least fall back to a read-only mode. That could be very helpful, compared to going down completely.


DynamoDb was only down in only one region, for the first time in years. It's hardly a reason to migrate off, especially with DynamoDb still being available in every single other region throughout. It's simpler to fail over to another AWS region than to an entirely different cloud provider with a different API.


Source for service with better uptime than Amazon?


I believe he meant things like using RabbitMQ instead of SQS (SQS isn't very good anyway IMHO) and things like that. More services running on plain old instances rather than AWS specific APIs, Azure specific APIs, etc.

This prevents you from getting too locked in and losing control over your application.

I honestly can't believe that Netflix can't even load their home page without DynamoDB and all this other stuff.

Edit: Looks like Netflix is back online (10:53AM EST)


Yes, exactly. It's tempting to drink the Kool-Aid on the various services but the more you rely on them, the less flexible you become and more open to overall service failures like this one. You can take advantage of the benefits of EC2 without relying on the other services.

I mean, if even Netflix can't stay up during this, what hope does a startup have?


Netflix -IS- up, and working fine. Pretty amazing, imho.



There is only Zuul?



Can you hint at what this means, for the uninitiated?


I think he was making a reference to ghostbusters: http://knowyourmeme.com/memes/there-is-no-dana-only-zuul


Pocket just came back up also. Issue might be resolved?


I was watching Netflix this morning and a number of titles had troubles loading repeatedly. They did eventually work though.



This is silly. Do you think you and your staff will maintain your homebrew RabbitMQ installation with better uptime than Amazon's team? And how much time/expense/manpower are you willing to devote to setting it up and monitoring it? And is that going to make your startup more successful than if you spent those resources developing the features your customers are asking for?


Yes, competent people with good planning can easyly do a lot better than that.

We do serious hosting for government and company and if we had a outage like amazon had yesterday, we would loose at least half of our clients and pay heavy cash penality.

I don't understand why people give so much a break to amazon. They have one of the worst uptime.


Kinda agree, same goes to AWS tools, Opsworks screwed us up back then due to AWS outage, and decided to decouple as much as possible from Opsworks.


You still depend on some provider. Or, are you talking about multi cloud installations ?


Alternatively, run a fallback data store.


I would highly recommend http://www.datomic.com as a database for multiple reasons, but one of them is that fact that it's totally portable between DynamoDB, PostgreSQL, MySQL, etc.

You can create a Datomic backup from any of these databases and restore them into a different one with the exact same semantics.


Isn't datomic closed source?


It is closed source, just like DynamoDB.

However - unlike DynamoDB - it has a quite usable free version and an even better starter version, which are good enough for production use. (The DDB local version is really only good for development) By production use, I mean for example a non-replicated MongoDB/PgSQL/MySQL, which is what I see from many smaller companies... They would be already really better of with Datomic.

If you really need more features, like high availability and transparent query caching with memcached, then the cheapest paid version has a one-time cost of $3000: http://www.datomic.com/pricing.html

You can easily save that amount of money by not spending 3-6 months developing tons of unnecessary queries, try to maintain that code and try to optimize its performance for scale...

On the "open-sourceness" note, I feel obliged to mention https://github.com/tonsky/datascript which IS open-source, and has a Datomic-like interface, but it's "just" an in-memory implementation.

But we are getting off-topic here I guess...


That's what it looks like. And after 10 seconds of reading I couldn't tell what it actually does.


Could be why my address is now "incorrect" https://twitter.com/search?f=tweets&vertical=default&q=incor...


Well at least it's nice to know Amazon uses Amazon. Could still be unrelated, but awfully coincidental.


As an AWS customer you need to be aware that the service health of all AWS services and not just the ones you use directly are important.

You say you don't use SQS or SNS? When they go down, you might not be able to get Logs or even login to the web Console.

Same goes for things like AutoScaling, OpsWorks, etc.


That's the beauty of micro-service architectures. You don't have a single monolithic point of failure, you have dozens of smaller ones.


This is why 99.99999999% uptime is a fallacy

It is not really measuring the time you're going to be up. That interpretation is based on faulty assumptions. It's like the statement "the sun will burn out before one bit is flipped" is wrong. It is quite likely that by that time, all the bits will be gone.

https://signalvnoise.com/posts/3067-lets-get-honest-about-up...


Yeah it is bullshit. They just had a failure so now they can claim, oh it is still 9 9s it is just that it is over 400 billion years averaged, not like you assumed, 10 billions. So legally still cool though...


Yes it is without a date range. 99.999% / year means something while 99.999% uptime does not mean anything by itself.


DynamoDb never promised 10 nines of availability, so it's a bit silly to hold them to that.


I hate the fact that most people(including me apparently) still assume AWS is not in their "downtime" equation. Just spend the last 30min troubleshooting SMTP auth problem.

Not funny when its Sunday.


You're not alone buddy. Been wasting hours of my life looking at this stuff too :)


I also noticed the random auth problem in SMTP and my knee jerk reaction was to google "status aws"...


Between AWS, Google, Apple, Facebook, Twitter and like I doubt I am alone in spending a ton of my time working around their various issues to run my tech stack.


Tinder is down due to this, now my life is pointless.


Not often you see a red status symbol on Amazon's status page (yellow is normally considered more than enough to indicate that the product is totally broken). Don't think I've ever seen _ten_ of them before.


The last time there were API outages in AWS, our autoscaling logic could not determine the number of running instances, so it felt it had too few. It kept launching instances, and due to the API outage we couldn't manually kill the instances either...

So we wound up with over 1,000 of these machines running which then due to our fan out of their DB they needed to load into memory from other machines, our whole environment crashed until we could kill off the erroneously launched instances.

This meant an effective full reboot of our entire platform...

It's was not a fun weekend.


Amazon CTO: We designed DynamoDB to operate with at least 99.999% availability :D


I've been using DynamoDB since it was released. In over 3 and 1/2 years of use, this is the first time I've experienced DynamoDB being down.


If down for longer than 18 minutes then they missed "5 9s" availability (.3 hours / 3.5 years). Not that it is supposed to work that way.


Software bug caused downtime vs infrastructure / hardware availability uptime to me are a different guarantee. I am pretty sure someone did something recently to DynamoDB.


Infrastructure guy here doing this for 14 years. Downtime is downtime. You get a pass if its "scheduled maintenance" you've notified your customers about to allow them to be prepared, but if you silently perform maintenance and it goes to shit, you've just counted against your metrics.


Nope. I still disagree. No service can guarantee 99.999999% unless you discount software upgrade. You just cannot. If you think those nines include software upgrades, you are probably over optimistic.


> No service can guarantee 99.999999%

Don't advertise it if you can't offer it then.

> If you think those nines include software upgrades, you are probably over optimistic.

If you advertise a product with a specific SLA, and you can't meet that SLA, you're a liar. Don't try to blame the victim because of inaccurate/untruthful marketing or engineering.


SLAs are just contractual thresholds for getting some specified redress if not met. They are not promises.

Not meeting a SLA is not lying.


I used SLA to communicate an advertised/marketed level of service. In this case, I agree, that SLA is the wrong term as there is no contractual agreement.


Maybe you are the one who needs to understand their SLA and fine print.


Why is us-east-1 so terrible? All of the downtime this year has been Virginia.


us-east-1 is where they typically deploy new features/hardware first (with the exception of efs which went to us-west first for some reason). it's also by far the largest region, with the most tenants and the heaviest traffic, so it's approaching the limit on what's physically possible to do in a public data center.


It's the primary AWS region. You spin up your resources by default there unless you explicitly select another region in the console.


It's probably a good idea to pick other regions, especially the ones closest to yourself

However, us-east is usually the cheapest one as well


Is it cheaper than the downtime?


"4:52 AM PDT We want to give you more information about what is happening. The root cause began with a portion of our metadata service within DynamoDB. This is an internal sub-service which manages table and partition information. Our recovery efforts are now focused on restoring metadata operations. We will be throttling APIs as we work on recovery."

(http://status.aws.amazon.com/)


Reddit is down right now with a 503 - they're on AWS.


That's the reason I'm reading here right now vs. time-wasting on Reddit at the moment.


Go to twitter, they are not on AWS.


S3 and VPC themselves appear to be fine, as noted on the dashboard, but the S3 VPC endpoints in EC2 are not ("we are also experiencing increased error rates accessing VPC endpoints for S3"). I was able to restore my sites by removing the endpoints from the routing tables.


well, time to find out who has failure tolerance built in to the architecture 8^)


failure tolerance is an alien technology for amazon...


that's completely untrue. there are many ways to do fault-tolerance in AWS. it's expensive, but it's possible. netflix even goes as far as simulating the failure of entire aws regions in their simian army testing suite:

http://techblog.netflix.com/2011/07/netflix-simian-army.html

That's why Netflix stays up when us-east or us-west are down.


It is true. Amazon offloads a decent amount of fault tolerance to the application provider, as you point out here. I will also mention that Netflix does not solely rely on Amazon for running their services. They run their own decentralized caching layer: https://openconnect.netflix.com/deliveryOptions/


Netflix is down.


Works fine for me


You know this is interesting. There were no symptoms for us at all that something was wrong with Amazon itself, and their status page was not updated in a timely fashion. I spent a few hours (in the middle of the night working with my laptop in bed next to my wife) trying to figure out what in the hell was wrong, only to find out through the grapevine that it was Amazon. This is extremely frustrating when providers are having problems and actively working on a solution yet their status page still has glowing recommendations of their service.


Le sigh. This is impacting AirBnB and I need to check in somewhere in LA later today. Good thing all the details are in the AirBnB messaging history with the host. Time for them to just go back to email.


Doesn't AirBNB also send an email copy of correspondence? I've always received emails (and text) when a host contacts me.


Yep. AirBnb is not working


I'm somewhat in the same situation. I want to check-in a movie I just watched and give it appropriate rating, but IMDB is down. I guess we'll just have wait, right?


Any knowledge or evidence that IMDB runs on AWS, and that the two are thus correlated?


Well, Amazon owns IMDB, so it's probably a reasonable assumption.


Quick lookup says yes, # host imdb.com imdb.com has address 207.171.166.22 imdb.com has address 72.21.210.29 imdb.com has address 72.21.206.80 # host 207.171.166.22 22.166.171.207.in-addr.arpa domain name pointer 166-22.amazon.com.


For starters, Amazon owns IMDb.


Amazon Echo doesn't work from 4am PST


If you can't get in the console, use awscli. It is responding fine!


SQS is the specific service giving me a ton of trouble right now. Hope they resolve this quickly. Had rayguns about sqs all night heh.

So are they saying they are throttling SQS because of the DynamoDB issue?


Here's the SQS Error Log right now:

3:14 AM PDT We are investigating increased error rates in the US-EAST-1 Region.

4:06 AM PDT We can confirm increased error rates for CreateQueue, SendMessage and ReceiveMessage API calls in the US-EAST-1 Region and continue to work towards resolution.

5:07 AM PDT We can confirm increased error rates for CreateQueue, SendMessage and ReceiveMessage API calls in the US-EAST-1 Region. As we work towards recovery, error rates may temporarily increase.

6:06 AM PDT We can confirm significantly increased error rates for CreateQueue, SendMessage and ReceiveMessage API calls in the US-EAST-1 Region. As we work towards recovery, error rates may temporarily increase in error rates.


I have seen multi-hour SQS outages recently. I'm thinking of options for how we can go about preventing an application failure if this happens again.

* If adding to SQS fails, temporarily store the item on disk or S3, then add to SQS when it's back up?

* any other options?


I'm not sure. I think many of the other services mentioned probably rely internally on SQS, so resolving the SQS issues might resolve most of the other issues as well.

Not completely sure though whether DynamoDB would benefit from relying internally on SQS.


Yeah good point. It's something I forget sometimes that AWS uses AWS... and that even if I don't rely on a particular service specifically, a service I rely on may, in fact, rely on that service.

Hopefully there is a relatively fast recovery on this.

Can anyone even log into their aws console right now?


I can log in.

I wonder if SQS uses DynamoDB, not the other way around.


Audible seems affected by this as well. I've made some purchases with my credits but the books are still not showing up in my library...and the checking out process is very slow.


FYI happened to me as well, but is resolved now.


The AWS KMS is not working. Critital payment applicaction down =S.


Why would you run a "critical payment application" in US East? This datacenter has 10x the downtime of West or Ireland.


Its pretty interesting to see how much our internet relies on cloud services like AWS, and all that are brought down with issues like this.


Address verification on amazon.com doesn't work for me at the moment, blocking me from making any orders.

Not sure if this is related.


Whoever uses autoscaling and, especially lifecycle notification with SQS will be in trouble now(I'am).

The morning is going to be started. Traffic will be ramped up, and not sure if new sevrers will be launched because CloudWatch is failed. Polling SQS to find lifecycle notification message fail too.


Not related to the AWS outage, but Rackspace CDN customers are in for a world of hurt today as well.

https://status.rackspace.com/index/viewincidents?group=28


Sign-ins to AWS console also appear to be timing out:

https://www.evernote.com/l/ABkKLgp3RjRDe5uV4pMlyVg1uzkW41DG4...


The aws services stack is deep and deeply intertwined. I've always viewed depending on such stacks in production with skepticism and I'd recommend everybody else does that too.

This might come across as tooting our horn a bit. But it's more about sounding a warning to other startups providing SaaS service built on public cloud. My own misgivings about relying on a cloud provider specific stack (both for the reasons of visibility/debuggability as well as for vendor lock-in) meant that PacketZoom services were not affected by this failure at all because we only use them as one of the many providers of raw machines. We use our own techniques to load-balance/failover among multiple cloud providers too (so even if the raw compute/network went away, our service would take a perf hit but not be completely down).


Or you could just run in multiple regions. Using multiple cloud providers limits your ability to take advantage of provider-specific features - why waste time writing your own load balancer when you could use ELB + multiple regions?


"Or you could just run in multiple regions."

Not when the original goal of the very service is to have presence in all geographical regions. If aws us-east is hit, I want the users to transparently failover to a server on east coast (perhaps one hosted by google or softlayer) rather than be directed all the way to us-west or eu.

And as for ELB, one doesn't use ELB for a custom protocol that load-balances/fails-over itself from the client :-)


Free Rugby World Cup!

http://universalsports.com/

"RWC2015ppv.com has been affected by an internet outage. Watch here. Not all mobile devices are compatible"


Videos and Alexa ia also down


This appears to have completely blown away all Alexa data. Even searching for google.com returns nothing.


I noticed this because I was unable to checkout on Amazon Prime Now just now.


I noticed it when I couldn't stream something. The player begged it off as Silverlight issue even when the flash option is chosen. lol


nothing like having a short-movie done in 48hrs using only web services and then WeVideo goes stale just before I download... 2hrs before submission deadline :(


other down sites: medium.com, getpocket.com, idonethis.com


Great! Almost every takeaway in Dublin has moved to Zuppler for their online ordering... Zuppler is hosted out of AWS.


Wow, very interesting to see how much of the infrastructure directly or indirectly depends on AWS.


docker, wercker, travis.ci are also affected. Can't login or stuff is really sluggish.


Nothing should be effected in non- USEast regions as per the status page


Better to have this happen on a Sunday than a Monday.


clouds are so great


[flagged]


That was from my phone which is why it was sparse...but...

> The Datomic software consists of the peer library and the transactor

> These components connect to one of several storage service options

I'm pretty well versed in computer science and buzzwords - but that means nothing to me. That was from the overview which I expected a dead simple "this is what it does and why you need it". There are several important questions that don't seem to be addressed:

- I'm guessing it's a database proxy that's intelligent?

- Is it better than HAProxy?

- How is it different/better?

And most importantly:

- Do I need to modify my current code base to interact with this thing?

In case you think I'm being overly dramatic - here are 2 examples:

https://www.statuspage.io/

On the front page I know exactly what they offer.

https://slack.com/is

On the product page I can see exactly what slack is and offers.


It's super off-topic to talk about this here. :/ I checked it on mobile too, but it looks the same as on desktop. It has a tagline similar to statuspage or slack: "The fully transactional, cloud-ready, distributed database."

To gain some insights into our differences, here is my thought process:

I agree that description is quite fuzzy, but to me it clarifies it's a database, not just a proxy. It suggests ACID properties by saying transactional and the cloud ready part hints scalability to me.

Then you would think "Oh, not again, another DB", so it continues with the "Why Datomic?" heading.

Afterwards the 1st link is "Read the Rationale", which explains everything everything about the software quite concisely. It's a database though, so don't expect to understand it a few seconds.

Being said that, the video on the 1st page gives a very solid explanation in its first 40seconds...

Thanks for the examples; now I understand what were you thinking. Those tag lines are really well done, but I think they had an easier job since they were describing a lot simpler services.


Prescribing psychiatric medication to another user is maybe not the best way to encourage them to comment more thoughtfully.

We detached this comment from https://news.ycombinator.com/item?id=10247727 and marked it off-topic.


The feedback is valid. If they can't figure it out in 10 minutes, then they can't figure it out. It doesn't help that the inner workings are obfuscated through multiple license agreements and hidden/closed source.

We are constrained by time. We can't invest our time into investigating every claim that we come across.

Drugs can help some, but they are not necessarily the answer in all cases (including this case).


nadam said 10 seconds, not minutes.

we are using the starter version combined with dynamodb in production and we found the payment structure very clear; no obfuscation whatsoever, unlike a microsoft or adobe pricing matrix ;) (it's made by the same guy who made the very open source clojure programming language, btw, and he is very much against obfuscation)

anyway, it's a competitive advantage for those guys' who are building a bank on top of it in brazil (https://www.youtube.com/watch?v=7lm3K8zVOdY). we feel we can avoid writing a lot of authorization and audit log code by using datomic, maybe u can save such work too.


Are you "Da Tom" in datomic? I'd suggest you be a tad more polite to those who ask honest questions or make statements of opinion here. And if you are "da tom", do you really want to alienate a potential customer or should you take their message to heart? I'm not going to bother looking at your page (and I don't need medication for it), but I'd guess there's no elevator pitch.


A Datom in datomic is not a person, its a 5 value entry: [entity, attribute, value, transaction, added]

Just to clarify since you appear to be linking the 'onetom' to the project, but I can't see any evidence she is in any way related.


I've clearly failed in my attempt to be humorous (while making a point I hope) ... Sorry for the confusion!


> Before that I also often came across as an asshole

Only before that, eh?

Also, mind disclosing the fact you are blatantly advertising your own services?


I wish it would be my own service. :D

How did you come to this conclusion? From smoyer's comment? (btw, thx smoyer, nice joke. :)

Also I wrote about ADHD on HN in the past. I really didn't mean it in an offensive way: https://news.ycombinator.com/item?id=3064846


DynamoDB is literally a garbage. That's why Amazon does not provide any SLA for the service... Even cheap Azure Storage provides cross-region failover.


Garbage you say? Dynamodb blazed a trail for many open source, eventually consistent kv databases. Certainly not garbage.


The only thing DynamoDB can do good is simplicity. Except for this, even MongoDB has tons more features than DynamoDB and the new version resolved performance problems existed in the previous versions.


Apples and oranges. People who call tools like this garbage fail to evaluate trade-offs at their required complexity space. Distributed systems have extreme trade-offs. More features => more bugs.


MongoDB really? You are comparing a software product to a multi-region global data service? This is just not a great comparison. If you build a distributed global data service on MongoDB you could compare that to DynamoDB.


I have a feeling you have no clue what you are talking about, nor read dynamo db paper or tried to use it in a production environment.


?


What is happening to stock price? oh; its sunday; forgot we can't trade.


I don't recall any outages having a material effect on Amazon's stock price.


Amazon can apparently go up or down 50% depending on whether a butterfly sneezes in Australia (it's wildly danced between 284 and 580 in the past 12 months alone).

In the background of such high volatility, it would be hard to pinpoint such a material effect from such a small disruption (in the big picture of course, I'm betting there are some pretty angry customers today due to the loss of a few sigma of reliability from this outage alone).

Now if a study were published indicating customers were switching providers over incidents this, then I think you'd have some material evidence. But is anyone else better yet? Azure was out for 12 hours last year apparently...

http://www.datacenterknowledge.com/archives/2015/01/23/cloud...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: