- Data Bricks?
- HD Insights
- Data Factory
- Cycle Cloud
- Signal R (does that have anything to do with the R programming language)? I know what Signal R is but if I weren’t a C# developer I wouldn’t.
I’ll give you that most of the names make sense, but it will still take me the same year to be confident that I was architecting the correct solution on Azure that it took me on AWS.
S3 as "Amazon Unlimited FTP Server": S3 has nothing to do with the FTP protocol.
VPC as "Amazon Virtual Collocated Rack": VPCs don't have anything to do with either collocation (they can span AZs), much less physical racks.
Lambda as "AWS App Scripts": really disagree with this one. Their intent is to be microservices or event handling functions, not scripts. "Scripts" isn't really descriptive but usually implies a manually invoked automation that modifies some kind of stateful resource (such as files). Lambda is basically the exact opposite.
There are lots more that are problematic or overly simplified as well.
I don't say this to crap on the effort, just... be careful if you're relying on this for your understanding of what AWS does.
All of the descriptors used for the Aws stack had a similar ring to them with an intended audience of someone a little older and with a little, or more, legacy knowledge about tech. Oddly I could see this useful for two use cases.
1- explaining something to my mom, now over 60 and no longer in the know with regard to tech. Might remember what ftp is or collocation is
2- explaining to an older very siloed technical person, I’m thinking enterprise/gov networking in a place where they don’t get exposure to newer cloud stack stuff. (Banking, healthcare, FEMA)
Mostly because FTP has been considered insecure and kinda deprecated since 1999, and we've had better things (eg: SFTP) since.
So, for somebody in their fourties it might be a better name, but for somebody younger, it's definitely misleading.
Apparently APIs are really hard for most companies.
I know that S3 has nothing to do with FTP, but it's a nice analogy that I can relate to. Get off your high horse.
- Binary Storage
- Blob Storge
- File Storage
- Reliable Storage
FTP is a specific protocol. Hard drives (as a sibling comment mentions) are a specific technology. Our discipline is confusing enough that names should be precise, not analogical.
"This is Santa Monica Cop. It's my innovative, original series about a hip, streetwise cop who brings his own brand of policing to a much wealthier area than he's used to."
'Oh, like ... Beverly Hills Cop, then?'
"Nah, you're not getting it! See, it's in Santa Monica!"
Yes, S3 doesn't natively allow retrieval/upload via the FTP protocol. But at the level of abstraction of "what does this service do for me, and why would I want to dig into the docs and incorporate it into my system?", the FTP analogy communicates the use case.
"This is S3. It's a way to map a bunch of keys -- which look like directories and naturally follow a kind of hierarchical structure -- to opaque blobs of data, allowing CRUD operations."
'Oh, like an FTP server, then?'
"Nah, you're not getting it! See, you talk to it with a different protocol."
This is why "simple descriptions" are hard. Leaving nuance behind makes a lot of room for confusion.
Also, I think the difference is more between filesystem and object storage. You are still uploading files to both.
Please keep the discussion respectful.
I see your approach in a lot of fresh graduates. They’ve learned things from the ground up, which means a good chunk of what they’ve learned is useless historic knowledge that doesn’t actually teach you a lot about how the modern eco-system of the web works.
That’s time they could have spent learning docker and authentication security. I mean, it’s often the one guy/girl who’s spent their free time setting up ADFS authentication in AWS for the fun of it, who gets the job.
But why wouldn’t you teach them that instead of teaching then legacy shit they’ll never use that also isn’t really that handy for understanding how things work now?
We’re supposed to stand on the shoulders of giants, and having struggled with apache or better yet the IIS, is just so utterly useless.
Should they learn older technologies like Apache or IIS? No, learning those technologies teach few, if any, transferable skills. Should they build a web server from scratch in C? Yes, that definitely teaches transferable skills even if they use Node or Go or whatever the latest web tech stack is later. Should they also learn the latest web tech stack? Absolutely.
I think teaching big O is and focusing on efficiency is much better than teaching someone C. Especially in the modern world where garbage collection isn’t bad and memory is abundant.
I mean, we use a lot of Python and a lot of JS, both are fairly inefficient. On the tech side, but it’s very productive on the human end, and Human Resources are a lot more expensive than memory.
If you spent 1 week writing something that was half as efficient as if you’d spent two weeks on it, that extra week of pay, will still be paying for for the additional hardware after you die of old age.
Not the best CS lesson of course, but having hired people who learned C before X, they really don’t seem to have learned the memory lesson anyway.
Big O is useful but I think its usefulness is overblown. There are cases where understanding how the kernel allocates memory or how CPU caching works or how networking works, you can develop an algorithm that is inefficient in terms of Big O but is still performant. Sometimes you can even beat the best theoretical algorithm. The reason is that k is extremely important. If I can make k extremely small using my knowledge of the computer and n is within reason, then Big O often doesn't matter. So I don't have to waste time implementing fancy algorithms as a result. But I also have a better feel when Big O does matter. If I can't make k small or n is a huge number or both, then I spend the time on algorithm optimization and then understanding Big O is helpful.
I will give you a real example with GPUs. Let's say you want to do some GPGPU and you have data that is too big to fit in RAM. How do you process this data efficiently? Knowing how the GPU works is very important to solving this problem. You could spend days, weeks, years optimizing the crap out of Big O in your code, but it won't really move the needle in a lot of cases because that isn't the bottleneck. The primary bottleneck in this case is the PCIe bus and a high-level understanding of how it works is needed to keep it full of data as the GPU is processing it. Once that is solved, the next bottleneck will likely be the data format. The GPU is most efficient when each data sample is independent which is related to how the GPU cores work, caching works, etc. So putting the data in GPU-friendly format (not always possible) will make everything go faster even before worrying about Big O.
What C does is it forces you to learn how the computer really works because it is hard to be productive in C without that knowledge. And that knowledge is largely transferable.
I certainly agree with learning and using Python or JS when it makes sense because they are very productive languages. But from an education perspective, people who only learn JS are less likely to be able to solve the really hard problems because they lack that foundation of how the computer works. And there are plenty jobs where that doesn't matter too much but you should probably have at least one person around that really understands how the computer works for the times when it does.
If I came to you not knowing anything about the vocabulary used in your field of expertise is that your field’s fault or mine?
“Simple Storage Service” is one of those products that they actually named sensibly.
I guess the article just had to make up an alternative.
S3 has nothing to do with the FTP protocol, but if someone completely new to AWS and similar platforms asked "What's this S3 thing?", you might say "Think of it as sort of like an unlimited FTP server." You can connect to it through programs like CrossFTP, browse a folder hierarchy, upload files, link those files from a website, and in general do all the things people commonly use (S)FTP for, with the same tools they use for (S)FTP stuff. It's similar enough that it makes a good starting analogy.
App Scripts is another description I think is reasonable. Personally, I don't think manual invocation or modification of a stateful resource is key to the term 'script.' I think of a script as a relatively simple program that is invoked, accomplishes one task, and then ends without interaction. "Erase duplicate files in folder X" and "send me an email listing all files added today" are examples of scripts you might run as a cron job or in response to an event (like free space dropping below some level). Lambda is close enough to this that I think "Lambda's where you store scripts and define when they get run" is a reasonable explanation.
It is nothing like an FTP. It is more of a database than a file storage now days. You can literally now use SQL to query data stored in S3 (S3 Select), and S3 is literally used as a database when you store JSON or CSV files. Which if this is all you have the costs are virtually non existent - but no body thinks about S3 this way. I personally had countless clients and developers use it as a FTP and be scared to make thousands of queries a sec - thinking they are dealing with a hard drive.
This type of thinking limits you for no good reason.
Yes, AWS is wast and complicated - and sure you need a starting point - but you need to be careful when you try to simplify AWS to much, because you'll limit yourself and other from taking advantage of what AWS can actually do for you.
All those abbrevations should be written out on that webpage.
- Amazon Transactional Email vs Simple Email Service (SES)
- Amazon Queue vs Simple Queue Service (SQS)
- Amazon Github vs CodeCommit
I always disliked most of the acronyms. The worst for me was Elastic Beanstalk, a name allegedly picked by Bezos himself. I thought that choice was really poor, considering that most of the world doesn't know what Beanstalk refers to.
This is what happens when you grow up in an English culture and think that the world functions like yours. I had the luck of being from a different culture (Italy), and therefore an easier way to discern certain things.
It doesn't mean that this "type" of fairy-tale is unknown, in fact its origins might trace back to 5,000 years.
In addition, "Jack and the Beanstalk" is an Aarne-Thompson tale-type 328, The Treasures of the Giant, which includes the Italian "Thirteenth" and the French "How the Dragon was Tricked" tales." 
There's also usually two other comments:
1. Why isn't it funnier / meaner? (an early version had a couple more pointed descriptions).
2. Why aren't these names more accurate?
The answer to both is that I was trying to be help people form a rough mental model of what / how and in what context they would use the services - that they could then take as a jumping off point for doing their own research into.
Nobody is relying on my jokey 2 sentence description of these services to make deep architectural choices about their apps. What I have heard many many times though is: "Oh! I didn't realize that's what AWS Service X did."
This is exactly what I'm trying to do with Hackterms  - would love it if you could check it out and maybe contribute AWS definitions. We're up to 1200+ from hundreds of contributors like you.
(though probably with a lot less snark)
A lot of old hacker culture can be learned from it. Might be actually a good thing that it's not updated anymore.
EC2 servers are "ephemeral". It's believed the operators glossed over the significance of the word when architecting their system and restarting their servers.
Sidenote: MtGox acquired them and made the users whole. Everyone wondered how they could do that. Maybe they never really did.
- Should have been called Amazon Beginning Cut Pro
- Use this to deal with video weirdness (change formats, compress, etc.).
“Amazon SQL” makes it sound like Amazon’s bespoke sql database. RDS is hosted Oracle, SQL Server, MySQL, MarisDB, and Postgres. They all behave so much differently it really doesn’t make any sense to combine them into one service.
Don't forget to see the DirectConnect description.
Let me explain why I believe (in part) they are doing this (and Jeff Barr can chime in).
AWS does this so that people learning from the start in cloud computing 'children' get used to the Amazon term (trademarked) and then it's harder for them to leave.
Because all of the other things they would need to do elsewhere have unfamiliar terms. And now the Amazon term is what resonates in their head 'the language they speak'.
So the brand name is the moat.
Meanwhile if you are Linode or Rackspace and you call it a VPS then it's easy to find an alternate provider of a VPS somewhere else.
Business wise it's on purpose and honestly it (in this case wouldn't work for everyone I might add) it's really smart.
So when I hear the term VPS I don’t think cloud, I think ‘inflexibility’ and ‘manual setup’.
The brand name isn’t even a puddle. The moat is the APIs.
ls list_files Lists files in a directory
cd change_dir Changes the current working directory
cat cat_barf Takes stuff in and immediately hacks it up onto stdout
My understanding is that when you catenate A and B you concatenate A with B.
That is, the catenation of A and B is AB, but the concatenation of A and B is A (catenated with what - we haven't said) and B (ditto).
Of course this is basically trivia and the word that's used is concatenate.
cat concatenate Concatenate all files given as arguments as well as stdin.
I’d rather google “ebs <issue>” because it’s pretty unique. Google eventually picks up on your preferences for non-unique names so not the worst thing though.
I appreciate it for this.
As a developer, I've tried to research Django-related items before (The Python Django). But as a guitarist, I've also done a lot of research about Django (Reinhardt). Getting the right Django is sometimes confusing when Google knows you have a deep interest in both unrelated items. Codenames are great like this if they're not reused.
I can't even find articles which I know exist using Google and nearly exact names.
"Amazon connect best practices troubleshooting" gets you to other AWS services like direct connect.
This is the only service name that truly bugs me.
This is a fantastic resource, I wish I'd had it when I first dived (dove?) into AWS. Maybe these shouldn't actually be names, but it would be nice to have nice descriptions like this in the AWS documentation.
S3 is simply an object store, which is a fairly common term. So "Unlimited FTP Server" is less obvious to me, and sounds old, outdated, and a security risk (although S3 can be that, too). Lambda is functions-as-a-service, or serverless. "App Scripts" would also tell me nothing, or sound like something for Office. SNS main use-case for us is not to "send mobile notifications, emails and/or SMS messages", but to link two systems together where some loss is acceptable, so "Amazon Messenger" is worse. SQS and SNS fit together well.
SNS is for producers. You use SNS to say something happened. You have no guarantee that any consumer actually successfully processed the message and if the consumer is down, that message is lost to them.
The only way a consumer can actually directly process a message is by subscribing to via http or lambda. But again, if the consumer errors or is down, you’re out of luck.
SQS is a traditional simple queueing mechanism. It has no fanout capability on its own but you do get the traditional queuing functionality. But it doesn’t make sense logically for more than one process (or group of processes doing the same thing) to consume the queue.
If you want the traditional fanout, filtering, multiple queues that do different things on the same event/message, you use SNS and SQS together.
For this sort of use case, I've generally opted to go with a Kinesis stream, and I'm having a bit of a hard time understanding why a mix of SNS and SQS would be better here.
You can send an SNS message with attributes and subscribe to that SNS message with any combination of SQS queues, lambda functions, emails, http endpoints, etc and with any of the subscriptions you can design it so that any of the subscriptions only get messages based on attribute conditions.
Also with SQS, you get the standard granular retries, dead letter queues etc. Yes with Kinesis you can do shards but it really doesn’t make sense to have more than one process reading messages from the same shard to scale out processing. With SQS, you can autoscale instances to read from the queue based on the queue size or just subscribe the SNS to an SQS queue and then subscribe the SQS queue to a lambda and let AWS work it’s magic.
This actually has me rethinking the architecture on a project I'm working on right now. It looks like Kinesis would be a little bit cheaper at the volume of data I'm looking at... But the SNS/SQS method will let me sidestep some potential future scale-up concerns I had with the 5 reads per second limit without making a Rube Goldberg machine of Kinesis Analytics feeding into additional Kinesis Streams, which would drive the cost up higher than an SNS/SQS fanout.
But with Kinesis, it’s true that you can do only 5 reads per second, but each read can have up to 10,000 records. With SQS, you can only get 10 records per call. I would think you could get much higher throughout with Kinesis, you would just have to handle storing your iterator/sequence number per shard somewhere to know where you left off in case of crash.
Kinesis is much better for higher throughput and you can always scale up instead of out if you need consistent throughout. It depends on your use case.
I know of a few workarounds - if the data is also being sent to S3, having some services working off of S3 events instead of the stream directly, or using Kinesis Analytics to fan out to additional Kinesis streams.
I might also just hook up SNS to the stream via Lambda and have them fan out from SNS. Hmm.
Event source -> SNS -> SQS -> lambda and set the concurrency limit of the lambda.
I always wished that SQS/SNS/Kinesis/et. al. could all be grouped under an "AWS Pub/Sub" brand that encompasses those features. I understand how those systems interact, but damn, it can be confusing so someone who's new.
That being said, I made it an edict on my team that no process could put anything directly in an SQS queue, they had to use SNS and subscribe it to a queue.
But, keep making things obtuse. The more obtuse things are with AWS the more money I will be able to charge in my next life as an overpriced AWS Architect.
I also love the sponsorship message:
> Hey, this is sponsored by SendCheckIt - and by "sponsored" I mean that's what I've been working on instead of updating this list.
Amazon EC2 Queue? How about Amazon State Machines? Or Amazon BPM?
The list is a bit out dated, and as for Amazon State Machines, it's a name more suited for AWS Step Functions, which is the successor of SWF (and defined as an actual state machine). Step Functions do cover lots of SWF use cases (rightfully so as a successor), but also more cases like service orchestration, etc.