Then there's how the data is sent. The metrics are converted to JSON, gzipped, then AES encrypted with a random key. The random key is then encrypted with a constant public key. the encrypted key and encrypted payload are serialized into some JSON, and is then POST-ed to an HTTPS URL. This seems unnecessarily convoluted, and even with my meager knowledge of crypto I already see some problems (compressing then encrypting is a no-no) which could spell trouble. Shouldn't you just need to upload the JSON of the metrics over an SSL connection?
I'll probably never use this just on principle.
Note that we got the feedback and disabled the collection until we do it right.
Also, it got "caught" in the first comment about that because it was clearly written in a Disclaimer section in the README.
Yeah, I don't agree either. It's annoying.
Metrics are converted to JSON, true, gzipped and AESed with a random key. It’s true.
The random key is then encrypted with our public key. True as well, and perfectly fine.
“This seems unnecessarily convoluted”
No, this is the proper way to do!
We don’t want to send data in cleartext. We don’t want to store data in cleartext in cloud servers neither. The statistics that we collected (PS: they are not anymore, we disabled the collection until we take time to explain what we collect and most importantly why) were then downloaded and analysed locally. AES encryption is perfectly fine and necessary in that case. Should we just rely on TLS as comments ask for, the statistics would be accessible by any one who has access to our AWS instance and infrastructure.
“compressing then encrypting is a no-no”
This sounds like a recipe taken from a very precise attack (the CRIME attack, as explained in comments down the thread). However, this attack does not apply in this case. Ensues a long discussion on how we are supposedly incapable of implementing crypto, although it rather seems the comment author messes things up.
Take away here: Judge for yourself! We want to simplify considerably how AWS infrastructures are created and managed with awless, give it a look. From version 0.0.14 on, no data will ever be collected without your consent.
Perhaps it is just my own meager understanding of cryptography, but I didn't know of anything that would make this a bad idea. Can you explain why it is potentially a problem?
An example is the CRIME attack. But that involves a chosen plaintext attack, so I'm not sure if something can be done with this method.
The research is published with information that is far too low-level. Very few software developers, including the vast majority of engineers with degrees, understand the theory and math behind these issues. The best of the worst of us know not to roll our own crypto, but that is clearly the tip of the iceberg. Someone out there needs to figure out how to properly explain "Crypto for Dummies" if we ever want or expect the overall security of encryption to improve.
I would say that "compress, then encrypt is bad" is the wrong message to take away from this type of vulnerability. In the case of CRIME in particular, the issue was that:
1. The attacker provided part of the message.
2. The rest of the message contained a secret.
3. The entire message (attacker-provided-part and secret) were compressed together.
We can stop there; the length of the compressed data now contains information about the similarity between the attacker-provided content and the secret.
The correct lesson to take away from this is "do not compress a combination of attacker-provided content and secrets". Compressing before encrypting is perfectly sensible. (And, by the way, compressing after encrypting isn't better, it is useless since your encrypted content ought to be incompressible.)
Sounds like a standard hybrid approach. https://en.m.wikipedia.org/wiki/Hybrid_cryptosystem
Plus, hybrid cryptosystems exist because symmetric encryption is much faster than asymmetric, which matters for large amounts of data. But this is (even before compression) probably only about a kilobyte of data. Why have the extra complexity?
Looks cool, but this is an instant no for me. Sorry guys.
userid and accountid stored in database here:
retrieved by stats here:
Added to stats payload here:
The hash functions are totally unrevertable, so it is impossible to come back to the original identifiers.
We added these anonymous ids, in order to know which commands are the most used per users.
Anyway, if you have better ideas on how to manage this, feel free to make a pull request or create a Github issue. And if you prefer to disable it, you can also do it easily with the source code (you just need to comment a few lines).
Edit: We opened an issue for this topic on our Github repo: https://github.com/wallix/awless/issues/38 . Feel free to continue the discussion there.
`awless` collects account number hashes. AWS account numbers are 12 decimal digits long, meaning there's a total of 10^12 unique values. Values are anonymized before submission using a single round of SHA256, so in ~2^40 hash operations, anyone with your database of hashes can invert every single account number.
For comparison, the bitcoin blockchain presently has a hash rate of ~2^61 SHA256 hashes per second. (Edit: I incorrectly stated 2^41 based on a hash rate of 3 TH/s, when it's actually 3 million TH/s.)
acct  has hash [d2a52833a6e434d2a55be0ce852c2dd9c5260c49a7c28ea4fa3fe2ac6d054d7e] (the last one it finished in 10 seconds)
A little effort with a decent GPU + hashcat though, would take this exercise down to a few minutes.
PBKDF2, bcrypt, and scrypt are all used where a database needs to store something and check for equality, but where the values in the database need to not be reversible even if the database is breached. They might be suitable here.
Your claim that you are using an irreversible hash is not comforting.
Your forced data collection is also not comforting.
That throws off their statistical analysis. Random cookies generates a new cookie for each new install or re-install, inflating the "users" count. If someone installs this on five different servers, the stats under random cookies will show five separate streams of data, and they will draw improper conclusions that a particular operation used on all of those servers if five times more popular than it really is. A configuration flag to disable the data collection is reasonable, but using a well-known hash like Whirlpool to anonymize the data stream is also reasonable.
If someone doesn't like data collection, then they shouldn't use cloud products, and they should just as vociferously declaim cloud services. With cloud services, whether or not the usage data collection is anonymized is at vendor discretion, but here, you control the source. Using a utility for a cloud service, and complaining about usage data collection, is ironic, considering AWS surely collects the same data.
Well of course they do, since all of these commands send off calls to AWS servers. And is you're using AWS products you already trust Amazon, that doesn't mean you trust a random person who put some code on Github.
Yes, we do collect minimal anonymised statistics in the sole goal of improving awless. All the statistics code is here: https://github.com/wallix/awless/blob/master/stats/stats.go
As the project is Apache licensed, you're free to modify it if you don't want this. Also, if you're conscious about privacy you should use application firewalls on your client side like Little Snitch etc. since many software that you install on your machine also do this.
However, the fact that the code is active at all will rule it out for some companies (firewall or not).
Perhaps make it something users can turn off in a config file? Not everyone can code in go, especially if their job is as a sysadmin, which isn't unlikely given that this is an infrastructure tool, so it might not be as simple as forking and editing the code for them.
Where I work, as long as the data collection code is in there, whether I can modify it or not, they won't allow it on our computers. I know this is not uncommon.
Dismissing this concern by saying "other software does this" while awless falls into a different category (small CLI tool) is also problematic.
My overall impression is that they don't do security very well.
A quick search of the repo of https:// and then http:// shows that the stats collection is apparently https.
I'm considering learning go but amount of 'return err' and 'return nil, 0, err' is instant turn-off. Is this best-practice error handling in go ?
The only think that could be done better is instead of always blindly returning an error, one could wrap them in higher level errors and build a sort of error trace:
- task failed because
- authentication failed because
- could not load credentials because
- because file xy.pem is not readable
>instead of always blindly returning an error, one could wrap
them in higher level errors and build a sort of error trace
Isn't that basically just reinventing exceptions, sort-of ?
Exceptions only contain a function call trace (stack of function calls), while a logical error trace is more like an explicit try/catch/wrap/throw around every call and could be more informative to the end user if done properly.
If you want to write it safely, you will have `if err != nil` everywhere.
This is normal and expected.
The best phrasing I've heard recently is "Go is useful drudgery".
For example, BuildStats is defined as returning (* stats, int, error), when it could name those and just use naked returns. In buildInstancesStats, they name the return values but then repeat them again on all 3 return lines instead of just using a naked "return".
Quick intro form the tour with a good example: https://tour.golang.org/basics/7
It lacks a Makefile or any documentation in the tarball for the latest release (v0.0.13)...
I found some sparse documentation on the wiki, but that appears to fail. I tried: GOPATH=$(pwd) go build .
$ GOPATH=$(pwd) go build .
main.go:19:8: cannot find package "github.com/wallix/awless/commands" in any of:
/usr/src/github.com/wallix/awless/commands (from $GOROOT)
/home/rkeene/Downloads/awless-0.0.13/src/github.com/wallix/awless/commands (from $GOPATH)
Then do a "go get github.com/wallix/awless", "cd $GOPATH/src/github.com/wallix/awless" and "go install".
Then in the same prompt do "awless"
awless is currently in its early life. We also plan to support both CloudFormation (first) and maybe Terraform at some point. CF and TF are exhaustive but more complex than awless templates.
awless is meant to simplify how we can create and manage an AWS infrastructure (which is originally our own need at Wallix), and we wanted to have simple templates as part of the CLI.
However, the whole thread above pointing out the data collection issues makes me far less likely to be moving away from the official cli + bash magic any time soon
A Go CLI tool has some deployment advantages over Python.
- everything inside a VPC
- the siblings of an instance
We currently rely on https://github.com/google/badwolf for that.
awless also includes an easy-to-write template engine (vs. CloudFormation or TerraForm - which we also plan to integrate).
See more features in the README. Note that, according to feedback since launch, it seems that awless is noticeably faster than aws-cli. The latest version (that you build with go install) has no statistics, try for yourself!
The missing values in the template (aka "holes" will be asked for by awless), so you can have staging and production deployments.
Note that we just released the project last Friday, and in particular have an ambitious roadmap for the templates. For instance, we could password protect accessing some nodes, and prevent wrong actions on the production env.
Point me to atleast a single popular SaaS product that does not have analytics in it's page.
"SaaS" means software as a service, something someone else runs, generally web based where metrics are relevant, not a command line tool on your own machine talking to your own infra.
From read me:
Choose one of the following options:
- Download the latest awless binaries (Windows/Linux/macOS) from Github
- If you have Golang already installed, build the source with: go get github.com/wallix/awless
- On macOS, use homebrew: brew tap wallix/awless; brew install awless
Which of those sounds like SaaS?
According to you, it's OK to collect data if the software is on another person's infra.