There are a handful of services out there that do this, I know because I've needed it on multiple occasions. It's nice that Amazon is providing it in house now, but it just reminds of me the last time I went to re:invent and walked through the vendor area and thought about how many of these companies are four dev cycles away from Amazon producing a baked in competitor.
I wrote a part of our product, actually invested years in it, that does a certain thing I won't say on Amazon's cloud. One day, Amazon released a competing service that does the same thing.
There was a flaw in our product that had one of our customers pushing to use Amazon's offering. Turns out Amazon's service cost about 4x what it cost us to do it ourselves. It wasn't obvious at the time given Amazon's purposefully obtuse pricing.
Eventually we fixed the bug, brought our customer back in line with the rest, saved some money, and have continued providing this service much cheaper than Amazon. I think it works for us because of a few reasons:
1. The service we offer is challenging enough that most dev teams won't want to do it themselves, they'll outsource it
2. Amazon has little incentive to charge less (see #1), little competition
3. We're small enough that we can still provide that face-to-face level of service and hand holding that's nearly impossible to get from a larger org
Amazon/Microsoft/Google may come into a particular market, but it doesn't automatically imply that they can (or even will) do it better/cheaper/faster.
I'm curious if you think that Amazon might have actually improved your business, because now you can offer anyone who buys from Amazon better service & price? If you can find them.
Sometimes developers choose a niche that's either directly in the path of the vendor, or even worse, on the roadmap of the vendor. In those cases, they don't really deserve our sympathy. It's almost like a game of PR, there's no way you're not going to have a fight on your hands.
Thanks. At the end of Spolsky's article he mentions a way to survive:
"A good platform always has opportunities for applications that aren’t just gap-fillers. These are the kind of application that the vendor is unlikely ever to consider a core feature, usually because it’s vertical — it’s not something everyone is going to want. There is exactly zero chance that Apple is ever going to add a feature to the iPhone for dentists. Zero."
That’s quite literally AWS’s playbook. Fill the vendor hall with tech, learn it, see what sticks and then crush it.
This is why I’m bullish on cloud agnostic tech. These practices don’t typically fair well in the enterprise space. This is why companies like MSFT are interesting to me. They partner and rarely kill. Amazon is the complete opposite.
According to reviews that's a lowly electret microphone capsule ($1) in a big housing. For podcasting you want a directional microphone. In defense of electrets, they sound great, but pick up everything.
The shape of the microphone suggests it's a directional microphone (and if you read the reviews, people think it is directional). That's perhaps not a scam, but certainly deceptive.
Even apart from AmazonBasics, if you've found established a niche product that sells well as a merchant on Amazon, you can be sure that Amazon will swoop in and undercut you ASAP.
That’s a fair point, but I would look to specific markets instead of tech to see where AWS and Amazon might run into issues.
Specifically within healthcare. You think large Pharma companies will use AWS after the pill pocket acquisition? Do you think once Amazon.com starts listing prosthetics large life sciences companies will run on AWS? What about providers? Do you think health systems will choose AWS as their cloud once Amazon launches their version of KP?
I don’t see Azure or GCP getting into these specific markets.
I would agree that it's unlikely that Google or Microsoft will start making prosthetics -- although they could spin GCP or Azure off into other corporate entities or operators who did someday.
More likely would be that Google, Amazon, Microsoft and some affiliated company would be competing in a space where telemetry from something like a prosthetic was reporting information that had some value.
In any situation, the downside of renting something is that you lose control. It is something that you need to think about and incorporate into your business strategy in some scenarios.
TBH such services are super simple to build, low hanging fruit for AWS, you shouldn't really base your business model and livelihood on big player's mercy
Amazon are like pre-lawsuit Microsoft and pre-decree IBM: you exist on their platform until you make enough money that they decide that your profit margin is their business opportunity.
I'm always interested in the selection bias here; Amazon et al do this to plenty of companies that are not using their platform as well. It would be more accurate to say that being an AWS vendor doesn't provide you with added protection from them competing with you.
No, you are also directly providing them with metrics if you use their platform- whereas Azure/Google aren't going to turn around and start selling a competing widget if you use their infrastructure.
My company, Thorn Technologies, has a direct competitor on the AWS Marketplace called SFTP Gateway. We haven't seen any negative impact on sales just yet, probably because our product is much cheaper, and we think better. But only time will tell!
Yeah, this is so typical of AWS to do this, and I know we're not alone.
An alternative to doing file based SFTP is to just treat SFTP like an API.
A company I work for implemented an SFTP service where every operation simply translates to some SQL DB lookup. And a file download kicks off a larger SQL query and generates the report on the fly, streaming the result straight through to the SFTP client.
Works great! SFTP can be an API just like HTTP. Under the hood the protocol is reasonably contained and doesn't require a filesystem backend at all.
I did something similar using the apache ftp client, as long as the resources are identified by a path it's very convenient and extremely easy, just implement a FileSystemFactory with apache mina and off you go.
From parent: "Depends a lot on the usecase of course."
The usecase that I see most often of SFTP (and hinted at in the parent's problem description) is generating one-off reports for third parties, or passing data to vendors who are stuck in the 90s, like financial services companies.
It's almost always read only (or read and delete), in which case implementing an API like this is pretty straightforward. Log unsupported commands perhaps and decide if you want to implement them later.
You could. I mean, at least with OpenSSH you can specify a byte range. That is how lftp is able to chop up files into many streams on SFTP. I can't imagine anyone doing this with a database however, at least, not for writes.
I think this implementation uploads it to memory before going to S3. It usually won't handle 20GB files (unless you have like 20GB of RAM) and in this case, were it a smaller file, it'll just never upload.
You need to make it transactional. Upload to a temp file name (something easily ignored by whatever backend processes are looking at the files) and then do an atomic rename once the transfer is complete.
Enterprises will love this. There are so many legacy app flows kicked off via sftp/scp file drops. Being able to hook into those via lambda events on the associated S3 bucket will create a whole ecosystem of enterprise spaghetti for years to come.
>There are so many legacy app flows kicked off via sftp/scp file drops
Yes,.... Legacy apps... because no one would choose SFTP for system that designed in 2017.
Seriously this is great, so many solution rely on SFTP, but so many companies fail at managing the service. Having an SFTP service that just works and is secure (hopefully) will help a ton of compnies.
The only downside is whitelisting but not on the SFTP server-side. Many enterprises restrict egress SFTP (usually for security reasons) so you need to provide IPs and they can’t frequently change because it can take enterprise network admins quite some time to deal with all of the bureaucracy and change control.
That said, I wouldn’t be surprised if modern networking gear can handle CNAMEs but there’s no guarantee that they’re using modern gear or if they are that the questionable outsourced team even knows how to deal with the modern capabilities.
This will certainly help a lot of use cases though.
It's less whether the gear is modern and more about the layer that it operates at.
A network firewall doesn't see the DNS name that an internal system looked up in order to make an outbound connection. It just sees the source/destination IP/port. Processing a rule based on source/destination IP or CIDR and port is very fast, and all happens locally. Trying to make that device handle rules by IP address is pretty tricky. Does it do a reverse lookup on the destination IP? That may not give a result that's even remotely like what the client used, especially for cloud-hosted destinations.
For a lot of applications (probably including this one), a proxy is a good approach, because DNS resolution can be delegated to the proxy, and therefore the proxy can easily apply DNS-based rules as well as IP/CIDR-based rules. However, proxies tend to make people unhappy because they generally require at least some configuration on the client side. Microsoft used to sell a product[1] that made this transparent for Windows clients[2], but obviously that doesn't help for most modern shops where a lot of the systems are Linux, MacOS, etc.
[1] Internet Security and Acceleration Server ("ISA"), later renamed to Threat Management Gateway ("TMG"), now deprecated and approaching EOL.
[2] It hooked into the network stack and rerouted requests based on a proxy routing rule table. Imagine a centrally-managed proxychains, but with the system configured to default to check the proxychains config file for every outbound TCP connection.
I wonder if you could use the DNS resolution cache itself to do the reverse lookup. As long as the DNS cache lasted at least as long as the TTL, it should work.
Yeah I assumed something like Palo Alto’s or maybe ASA’s could do more since (I believe) they’re doing actual inspection but I’m only familiar with them in passing.
This is something that AWS needs to change. I work with millions of large files and have to keep a very large (PBs) local storage array just to make sure that things are right before uploading to S3 so that I don’t have to pay and wait for arch changes like this.
I think the point being that when a filesystem the mv is atomic and just updates an inode but on S3 those operations can take place on thousand of different machines.
We currently pay $250/month through some small vendor for hipaa compliant sftp hosting (that we transfer a whopping 50kb on a weekly basis). I always felt like it was a rip off, but azure/aws didn't have their own version. And I'm loathe to manage a VM. PaaS is my sugar bear.
My eyes lit up when I saw this. We're an azure shop, but I'm not afraid to use AWS for limited cases. Then I saw - $.30/hr (so, $214/mo). Really? REALLY?
Wouldn't it be comically easy to just the add SFTP as a protocol option for S3? Why does this need a dedicated VM to run it? (Yes, I know this is PaaS and you don't manage the VM, but they're essentially pricing it that way)
HIPAA compliance, even on AWS is extremely expensive. I believe the best vendor to get HIPAA (someone correct me if I'm wrong) is to go with Google Cloud. Last time I checked did not charge any extra for HIPAA BAA signing.
Edit: I stand corrected on this, AWS no longer requires dedicated hardware for BAA HIPAA: Sorry I didn't look this up, I had old information.
AWS is IMO the best vendor if you are looking to for HIPAA compliant cloud computing. Our bills are higher than they would be for a non-medical application, but nothing astronomical. Programmer time is still way more expensive.
My work has made the aggressive stance no HIPAA data on AWS due to the legal and billing issues, not a technical one. Technically it a good solution, and we might use it down the road. We already use AWS/S3 for firmware device updates.
Couldn't you use an Azure webapp? It has FTP and user management, though not as granular as this. You should be able to use IAM to only allow the user access to that one web app and they can set their own username/password. Not quite as simple as sending out creds from one interface, but it's an option. Not sure how many users you have, obviously if you had tons and tons of users it would become a choir, but if it's just a few users I'm thinking a webapp could handle that.
Alternatively, if trusted users enough, you could use an Azure blob and use CloudBerry. That one is probably not HIPPA compliant though.
I don't even know if this new AWS SFTP plan is HIPPA compliant, don't you have to have a log of file check in/outs? And user login logs?
For companies, $214/mo isn't
much if it makes an admin's life easier.
At the same time, Amazon isn't going to price it so it's attractive to everybody, because it sounds like they'd rather people not use it if possible. Sounds sensible to me, legacy stuff is always going to cost you one way or another.
Nothing beefy, probably, but as for HIPAA compliance, AFAIK, you need to sign several specific contracts with your provider and blahblahblah, probably they're just billing you for the incovenience and for having the HIPAA seal.
You have to sign a single business associate agreement (BAA) depending on the nature of the business you are working with. These are usually boilerplate contracts around 2-3 pages long and full of legalese.
It is uncommon for someone to charge you for signing a BAA. It is very common to tie these plans into enterprise only pricing. This is terrible because it adds unnecessary costs to the medical system (which get passed onto consumers) and it completely shuts out smaller players from entering the marketplace. (Cue me side eyeing every single error tracking software SaaS currently on the market who wants to start at 5K a year for their 'small' plan - get real guys)
Yep, I was checking the DICOM (medical images) Viewer plugin for Box the other day and, yeah, they don't charge you for the BAA, but you're required to get an Enterprise or Elite plan, which price isn't even listed, and probably on the thousands:
"Pricing for Box Enterprise or Elite plans as well as the DICOM Viewer additional seat surcharge can be handled by our Box sales team once we know how many seats your are looking for across your company and what types of collaboration use cases you need. "
This is an enterprise offering. We gladly cough up more than this on almost every service so we don’t have to manage it. If something breaks, we just use our business support plan.
A tonne of our (enterprise-ey) customers had such trouble trying to integrate into our S3 flow that we started launching VPS for each that abstracted it away into simple SFTP upload/download, which they were used to.
Although this is much more expensive than Lightsail, the man hours saved will make it worthwhile.
> Can you elaborate? I mean a plain CentOS server running SFTP, S3FS seems about as set and forget as it gets.
Think about the operational costs: someone needs to manage keys, logging, security updates, when S3FS coughs a lung and hangs you need to catch that problem and remount it to restore service, etc. This service reuses the existing authentication systems so you don't need to spend time configuring and managing integration with your customers’ LDAP/AD infrastructure, etc. If you deal with anything which hits PCI, HIPAA, etc. you need to be able to certify that your custom design meets those requirements as well.
That's not to say you can't do it yourself but for many places there's a fairly significant amount of work where the cost of doing it yourself is greater than 5+ years of managed service costs.
Exactly this. If sticker cost is your leading factor then these kinds of services can seem crazy, but when you factor in the real cost of self-hosting then it quickly becomes a no-brainer.
We're more interested in what happens when things break (and who's responsibility it is) than minor cost savings in calm waters.
One other area which tends to get ignored is opportunity cost: if it's the only thing you do there are many things which aren't that hard to operate but if they're not a primary function the cost of having to pull someone off of other projects to handle problems, security updates, etc. is more than the direct service costs.
S3fs means you can use most existing apps without managing local storage. It doesn’t work quite as well in practice but the concept is appealing if you need to support software which wasn’t designed for AWS and uses non-trivial data volumes.
So, how do I make sure my connection isn't MITM-ed?
There is no server host key anywhere to compare. No CA certificate support. Doesn't look like ed25519 is supported either.
Somehow people don't use self-signed certificates all over the web but for sftp it's "fine" apparently.
For SSH (+SFTP) you are expected/obligated/etc. to have some way to verify the correct host key. There is no relationship to the clusterfudge of public CAs. Nor are there x509 certs.
This is why AWS is so far ahead: survey the landscape, find the things they don't already cover, and come up with a managed service for it. It's usually not perfect, but it almost always just works.
You should try the mirror sub-system of lftp [1]. It can replicate rsync behavior on a chroot sftp server. No idea if that works on Amazon, but I use it all the time on my own chroot sftp servers.
lftp is fantastic. The mirror function has a “reverse mode” too
For regular tasks you could also look at “rclone” which is like rsync in many ways but can upload to s3, backblaze b2, sftp and any more directly. Without remote support.
So, I spent a couple days a few months ago building exactly this on an ec-2 instance. I have an SFTP service running on an Ubuntu box, it has jailed homes for users, it's ssh-key-only, uses s3fs to persist things to the correct buckets, etc.
My only problem with the managed service (which I'd LOVE to switch to tbh) is I can't for the life of me get it to actually connect and upload a file. I suspect I'm doing something wrong in IAM, but the tutorials suck and it looks like IAM isn't even ready for this service yet. I can get a user authenticated, but it's like it's trying to figure out where "home" is and crapping out, connection closed. Nothing helpful in the verbose output, either. Bummer.
And to emphasize, the process of simply adding a user to this thing SUCKS. In my homebrew instance, it's just a matter of generating the key pair and dropping the public key into a folder on S3. Cron job reads the bucket, creates new users/homes/etc for anything new, all pasted together using bash scripts basically. But at the end of the day it's ridiculously simple. I'd hoped a fully managed solution would actually be simpler (instead of simply more stable because it's managed, after all).
Ah this would've been so useful 18 months ago. I had to spend MONTHS to convince a vendor (government) to use S3 to upload (keybase-encrypted) files instead of SFTP.
And they finally budged. This would've been so much easier.
No FUSE. Pure Go so it's low on resource usage and high in platform compatibility. No OpenSSH. No screwing around with Linux users or whatever. Just a single declarative configuration file. You can run this baby in a Docker container with some adjustments to the host if you want this on port 22.
I had to sourcegraph GitHub a bit to find this thing. SEO is so bad on this implementation. I don't know why.
We ended up implementing a REST API endpoint for SFTP to provide an easy way for web apps to transfer content without having to speak the FTP protocol: https://kloudless.com/products/file-storage/
I can see this being valuable for apps to get user content into S3 more efficiently from the server-side rather than funneling it through hosted servers. The one caveat is programmatic user management, which I'm sure is possible.
It's SSH File Transfer Protocol. When you say Secure File Transfer Protocol many people think about FTP over SSL if you don't emphasize it's about SSH.
> many people think about FTP over SSL if you don't emphasize it's about SSH
Huh? Sure there's always potential for confusion but every time I heard anything about FTP over SSL (which no one seems to actually use) it's been called "FTPS"
I agree FTPS is the right acronym for this but I had to correct people about this all the time. So many people actually have no idea SSH does more than just letting you execute command line programs on a remote server and FTP is not the only/best protocol to access remote file systems over the Internet.
In computing, the SSH File Transfer Protocol (also Secure File Transfer Protocol, or SFTP) is a network protocol that provides file access, file transfer, and file management over any reliable data stream.
It seems there are no web-hooks / callbacks, so you don't get notified when a new file is uploaded (or someone downloads a file).
Another issue is that if your have to support a partner with SFTP data transfer requirements you may have to support one with FTP/FTPS requirements as well. At this point you will have to go to a dedicated FTP server (or outsource it to another company) anyway, and AWS SFTP service will be redundant in this scheme.
It's S3, so you can use Lambda, it says so in the article.
> You can write AWS Lambda functions to to build an “intelligent” FTP site that processes incoming files as soon as they are uploaded, query the files in situ using Amazon Athena, and easily connect to your existing data ingestion process.
Unless I'm missing something, this functionality has been on the AWS Marketplace for a while. We've already used an SFTP Gateway straight out of the marketplace. This is a tough news for these folks, and generally speaking, if you're making good enough money off the marketplace, then you're possibly on the collision course with Amazon's "new" roadmap.
The AWS pricing page for this service says it costs about $225/month for a lightly used instance. I implemented the same kind of thing on AWS using a nano-sized instance for about $10/month. The instance is managed with an Ansible Role for automated SFTP server management. I connected it with an off-the-shelf AWS Lamda function which listens for S3 PUT events and copies files to the SFTP server as needed.
My solution took a little more human-time to setup than the AWS service might, but once setup, it saves about $200/month.
$200 a month is nothing for a business. Anything that we don't have to manage ourselves or worry about reliability, scalability, and we can just use our AWS business support plan is a win.
The alternative is developer time. Nothing about managed services is ever less expensive if you don’t account for developer/Devops/netops time saved.
A small company has even more of reason to want as many managed services as possible. You can avoid hiring netops if you both have a third party managed service provider to manage your network and you have developers/architects who know enough to fill in the gaps.
On the other hand, netops staff costs are a lot less... is liquid the right word?
Yes, $200/month is probably not any more than a couple hours/month of even a very lowly paid developer or ops person, once you account for benefits and overhead.
But once you needed to hire that person for any reason... their annual salary is already on the books. Giving them more work to do doesn't affect your budget. But another $2400 a year might. Yeah, if you can avoid hiring that person _at all_... but you probably had some reason you did have to hire a person or three already, and now you've got them.
The actual experience of working in a small under-resourced organization, in my experience, often looks like this.
That’s why you don’t hire them at all. You use an MSP. Even if you do need someone on prem, the simpler you make your infrastructure, the less skilled your netops person has to be. You can hire someone who basically is a help desk person.
When that one netops person leaves, it usually falls on the developers to manage it.
Baremetal vs Cloud hosting -> resource for resource baremetal will almost always end up being cheaper.
The only way you save money on managed services is the cost of management. Meaning every hour that someone doesn't have to spend maintaining infrastructure is a cost savings to the business. Every minute saved by allowing someone else to do the "undifferentiated heavy lifting" is money saved.
This incurs per-hour charges to run the VM that runs sshd, same as running a micro instance with FUSE S3 would, although with slightly less admin attention required.
Presumably this will handle large file uploads with aplomb? Multipart upload with s3 can be a pain (when you want someone else to be doing the uploading).
New services rarely launch with CF support, if you want to programmatically create SFTP servers TODAY you could write a Lambda that uses the SDK and reference that Lambda with a CF Custom Resource.
FWIW I talked with one of the CF devs at re:invent and he said their team's goal is to have day-one CF coverage of new major offerings going forward, so we'll see. Maybe next year.
Ideally the same way we define an EC2 instance, perhaps bound directly to an S3 bucket resource defined in the same script. Ideally reading the config definition from an S3 file that we can update at will.
Unless you disable it in the sshd_config, it's supported by most Linux distributions. Yes, you'll need a client, but any modern client supports sftp.
The only tricky part is chrooting the users.
Hard to make a B To B Amazon tool these days.