The syntax looks quite the same, but Amazon's awscli Python installer has loads of dependencies. I'll have to see if it's worth switching.
Anyone already know if Amazon's new CLI thing has any big advantages over Tim Kay's Perl aws?
Why are the dependencies a problem? By combining a handful of smaller, focused modules that each do something well, you can end up with something better than if you were to re-invent the wheel for every need.
AWS and the Python dev team are doing a heck of a job on botocore, and have cranked up the pace of improvement in the last 6 months. This CLI reaching "official" status will guarantee (at least until further notice) that it will see updates and fixes. It's likely to see early or earlier support for new AWS services.
`pip install awscli` just installed 26 other modules besides awscli. Now I feel a little obliged to go check out those 26, as well, to see what they are.
I agree about not re-inventing the wheel. But the amount of stuff installed is definitely a considering factor when choosing between two seemingly identical scripts.
So? Use a virtualenv and stop worrying. These 26 dependencies will be separately updated and maintained, who knows what warts are sitting in the monolithic perl scripts.
There are pros and cons to both sides.
Would this make you feel more like you knew what was installed on your system? Would you have felt the need to look at the source code, including the source code for any embedded dependencies, in, say, a 'vendor' or 'lib/dist' directory or something?
What are the plusses and minuses of each approach? There are some plusses and some minuses either way. After considering them, do you still have a problem with the 'new way' of doing things, where a program might install along with explicit dependencies via pip (or rubygems in ruby) separated out in a different place in the file system, vs. embedded/bundled dependencies?
I'd love to have a simple way to package python apps that depend on other python and native libraries without having to install things separately.
Bullshit. Python supports having modules installed into local locations (see virtualenv).
Just `virtualenv ~/.local/lib/aws; ~/.local/lib/aws/bin/pip install awscli; ln -s ~/.local/lib/aws/bin/aws ~/.local/bin/aws` and put `~/.local/bin` into your PATH.
Java's CLASSPATH causes enormous pain for end-users. Just read the Hadoop mailing list. The fact that Java doesn't have a sane default for where to put anything or how to manage dependencies is a huge flaw.
Back when I was doing AWS, I just used the C binaries (I forget what they were called) to transfer things to or from S3. I just wanted to avoid installing hundreds of megs of dependencies. We paid money to transfer our AMIs around, after all! Still, a more full-featured tool will no doubt come in handy in some scenarios.
Hadoop jars are hundreds of megabytes, and we have multiple daemons. Duplicating all those jars in each daemon would multiply the size of the installation many times over. That's also a nontrivial amount of memory to be giving up because jars can no longer be shared in the page cache.
Some of these problems could be mitigated by making Hadoop a library rather than a framework (as Google's MR is), or by pruning unnecessary dependencies.
I do sympathize that something akin to the maven repository and dependency mechanism hasn't been integrated into the JDK. I was on the module JSR and continually pushed them to do something like that but it turns out IBM would rather have OSGI standardized and so it deadlocked. Maybe something will come in JDK 9.
Sad email thread from years ago: http://markmail.org/message/y2a6nzhcsp62p5yv
Right now, we have several Maven subprojects. Maven does seem to enforce dependency ordering-- you cannot depend on HDFS code in common, for example. So it's "modular" in that sense. But you certainly never could run HDFS without the code in hadoop-common.
None of this really has much to do with CLASSPATH. Well, I guess it means that the common jars are shared between potentially many daemons. Dependencies get a lot more complicated than that, but that's just one example.
Really, the bottom line here is that there should be reasonable, sane conventions for where things are installed on the system. This is a lesson that old UNIX people knew well. There are even conventions for how to install multiple different versions of C/C++ shared libraries at the same time, and a tool for finding out what depends on what (ldd). Java's CLASSPATH mechanism itself is just a version of LD_LIBRARY_PATH, which also has a very well-justified bad reputation.
I don't know of anyone who actually uses OSGI. I think it might be one of those technologies that just kind of passed some kind of complexity singularity and imploded on itself, like CORBA. But I have no direct experience with it, so maybe that is unfair.
I like what Golang is doing with build systems and dependency management. They still lack the equivalent of shared libraries, though. Hopefully, when they do implement that feature, they'll learn from the lessons of the past.
Indeed, an official supported friendly and easy-to-use CLI interface is long overdue.
Mitch Garnaat who built and maintained boto over the years was picked up by Amazon last year and since has been building out botocore - which the aws-cli tools use under the hood.
I noticed it had happened sometime last year, and thought I'd share what little I know.
Command-line tools used to be your only option for managing AWS and Amazon always create their API and shell tools before the Console.
$ aws ec2 describe-instances
Pretty strange decision IMO.
I did have to break out: cli53 rrcreate.
Might lose some favour for new projects, though.
Reinstalled and now they have shorter, saner commands and real help pages... and they also changed the pager from 'less' (my default pager) to the crappy 'more'... because (apparently) you never want to go back a page in a help file. More detailed content, but it's harder to review. Odd.
$ PAGER=/usr/bin/less aws help
Great to hear that this is built on the awesome boto library. Will serve as an useful reference for boto developers.
One API point that I've found lacking in boto is a "sync" command for S3. Take a source directory and a target bucket and push up the differences ala rsync, that's the dream. Boto gives you a the ability to push/get S3 resources, but I've had to write my own sync logic.
So, the first thing I went digging into is the S3 interface of the new CLI, and to my surprise, they've put a direct sync command on the interface, huzzah! Their implementation is a little wacky though. Instead of using some computed hashes, they are relying on a combination of file modtimes and filesize. Weird.
Anyways, glad to see AWS is investing in a consistent interface to make managing their services easier.
One disappointing issue is that the listing process on CF is a magnitude faster than S3.
CF: real 2m7.628s
S3: real 14m15.680s
If you're looking for something a little more robust, I just released this a couple of days ago-
It's still in a really early stage but I've been using it to sync and configure my personal site that's hosted on s3 and it's worked well so far.
* Why does each one of them use different parameters for the same stuff? WHATEVER_URL could be REGION (WHATEVER = EC2, ELB, ETC, ...). One uses a config file for ACCESS_KEY_ID, another one wants a environment variable. Plus they use different names for common stuff.
* Why do are the command line arguments named inconsistently across these tools? --whatever, --what-ever
* Why don't they fail on command error? Right now this only happens if there's a configuration problem - if you send a command and there's a problem (like S3 access denied), it still returns 0.
* Why don't they provide synchronous commands? Right now I have to do the polling myself. Super annoying.
Anyway, I've been using the ones included in Amazon Linux - I hope they were the latest version. If the new version fixes this problem, feel free to correct me :)
More startups should realize that they could increase developer adoption of their products if also published shell script interfaces to their product. In fact, your startup should really start off as command line accessible and add the gui after.
> In fact, your startup should really start off as command line accessible...
I would argue that startups should always start at the API level and work upwards. At least that's what we did with AWS.
As much as I'd love this to be true, it is so wrong unless you are targeting other very technical users.
Most people I know (ie, potential customers) think anything in the command line is old/impossible to use/nerdy.
• crap ton of dependencies, making it a pain to install on Arch https://aur.archlinux.org/packages/aws-cli/ Hint: static build ffs
• broken help. `aws s3 help` instead of `aws help s3` which is more natural to git users
• no glacier support
• no s3 progress for downloads or uploads
• many jarring UX issues https://github.com/aws/aws-cli/issues/305 https://github.com/aws/aws-cli/issues/304
I'm the author of glacier-cli (github.com/basak/glacier-cli). I'd be happy to see it move into aws-cli. If anyone wants to do this, please get in touch to coordinate.
* EDIT: that is, use Asgard because it does a great job of managing autoscaling groups.