Do you chunk the upload so if I lose internet for a second, it can just upload the chunks that failed?
This is an issue I have with one of my employees. They only get consumer Comcast, and the connection is crap. I ended up writing a bash script that will chunk and scp the files to a central server, and if it fails a chunk, it leaves it in the directory for her to manually re-run.
That said, I pay $20/mo for the file server (which i would no matter what) and it has 3TB of outbound bandwidth and unlimited inbound, and is less than 5% of your cost.
Although we do chunk the upload and send the chunks in parallel to the server we do not at this point have the recover functionality implemented. We are able to recover from a disruption in internet connectivity with retry logic in javascript but if your computer crashed completely we would not recover. That is on the roadmap and expected to be released in the next couple of months though.
In terms of cost there are certainly cheaper ways to accomplish transfers. MASV is designed to be very fast and we have 9 servers across the world to enable transfers anywhere. If your doing local transfers your own file server is likely good enough but if your sending hundreds of GB's cross country or across the world you will likely run into major performance issues. It really comes down to how much your time is worth and if file transfer are a consistent requirement for your business. This is certainly intended to be a B2b tool less for consumer use.
> I ended up writing a bash script that will chunk and scp the files to a central server, and if it fails a chunk, it leaves it in the directory for her to manually re-run.
Use Syncthing, it traverses NATs, it is trivial to set up, it is seemingly very secure by its nature, and crucially for you, it transfers in blocks. You can administer it through a web interface with credentials (or with none, if it's just listening on loopback).
A lot of why our users enjoy MASV is because it's dead simple to use for non-technical users. Not that syncthing is very hard but as a video pro dealing with clients the easier it is for them to just click a download button the more likely it will happen. Also because it's pay-as-you-go it's easy to charge back the transfer cost as a part of the project bill.
Most certainly. Another major benefit is that you folks provide storage, which inherently simplifies administering something like this. One benefit of Syncthing/BT Sync that I have not seen emulated by other vendors is peer-to-peer transfers, which can be considerably faster if both peers are on the same network or an adjacent one (including one on the same provider, or connected to the same exchange, or if the central service has no nodes on the same continent as the peer). The performance is just unmatched for such transfers, in my experience. A peer-based protocol could be a considerable help even if you don't intend on allowing direct peer-to-peer transfers, because (if implemented with sufficient care) it can allow you to choose better nodes to maximize throughput and integrity.
I think there is a very legitimate need for centralized administration, I have all of my Syncthing node configurations synchronized over Syncthing itself (which is a bit of a risk, but not overly so), but I would consider paying for something (if I could have similar confidence and flexibility with the client software) more integrated, with some safeguards against accidentally disabling configuration sync (which would require me to directly configure the node) to a specific node, and some integrated ability to have nodes self-report and self-allocate.
Absolutely agree. We really built our network out to be independent of our file transfer tool. It creates routes across any cloud provider and can integrate into any cloud storage provider. It also uses machine learning to determine which routes are optimal based on previous performance and time of day. I see a future where you could use our network for many different products such as an accelerated VPN, streaming live content, or as you pointed out as a pass through network for a P2P connection if both endpoints are available but it would route across our network to avoid congestion. Potentially even being able to make a copy of the transfer in the cloud for archive. Even P2P relies on routing across the public internet which is prone to congestion and is set up to do least-cost routing for the ISP's. Usually it's less of an issue though because the protocols help ensure it pushes through that congestion. Lot's of cool possibilities for this in the future and nowhere near enough time!
I'll have to check that out. I was just desperate one afternoon after dropbox was failing to sync because of connectivity issues, and I knew bash just well enough to split, scp, rm, repeat.
You may also want to look into using rsync as an alternative to scp. rsync is much smarter about how it syncs data, so it should be better for your use case.
Is your 20$/mo server 1Gbit? At a fairly common 100Mbit these file transfers would be 1/10 the speed and also need to be downloaded. Even slower if you need to send to multiple people.
Running bit torrent let's peers share the data you upload and get the same net effect as this. But, that's not as far as I know a browser plugin.
PS: I don't think it's a big deal, but executed well it might be a valuable niche.
It averages 400Mbit upload and 350Mbit download, not that it matters to me personally, as my employee can rarely upload faster than 4Mbit.
That said, their 1Gbit connection will probably average around the same speed if they have more than a few users uploading 50GB files during normal business hours.
Just to clarify we will fill a 1 Gbps pipe but our servers are provisioned to do much more than 1 Gbps. If many users are uploading at the same time we scale up beyond 1 Gbps that's not our network bandwidth maximum.
That's good to know. 1Gbit is our "max" but that works just fine for the company's "network share" If we ever were to start hitting our bandwidth caps, I can pay to increase those, but the speed is pretty much fixed without changing providers. If speed were to ever be an issue, I would probably look at a different service like yours. It would be a massive hit at your current pricing, though.
For starters, it's a proper replacement for "a bash script that will chunk and scp the files to a central server, and if it fails a chunk, it leaves it in the directory for her to manually re-run". It's installed on almost all Linux and Mac systems, so if your customer is using a script already, they might as well use rsync.
> Would you mind explaining why rsync is a better solution?
rsync basically chunks the files and sends the missing (or differing) chunks over scp, leaves the files in the destination directory if something fails, it is more or less an automated tool that already does what the GP is talking about.
But that 'someone' is someone who is already running a Bash script and re-running it each time an error occurs. So I don't see the problem here about using a CLI.
For technically capable people there are certainly lot's of good options for sending lots of data fast. We are just trying to make a service that makes this more accessible and requires no setup. As a side note we are currently working on an API and we intend to integrate with rsync so it can be used to upload content to our network.
I read it as "I'm running a bash script for someone else so they can pick it up from a server", not "my employee who has this issue created a bash script"...maybe I misinterpreted.
Because we have to deal with people as they are, and not as how we want them to be.
This is for sending large assets from party A to party B, where either the parties do not have access to a shared rsync-capable server, or they do not know how to use rsync. On a deadline, you can't afford to become a CLI trainer or require special software.
A common use case would involve service bureaus and creators. Trust me when I say that there are a hell of a lot of creators out there who would never be able to manage rsync. You want to make stuff like this drop-dead simple for people. That means a service that works in-browser.
Couldn't have summarized it better myself. I like to think we are just making cloud services more accessible for non-technical people and adding a tax to that. There is always ways to get something cheaper by rigging up a setup yourself but for creators that is time that could be spent on creating instead of infrastructure or tooling.
I use a javascript library that works in the browser and supports S3 file copy chunking and resuming. Host that in a simple S3 bucket, create a target bucket with acceleration enabled and bob's your uncle.
It's very efficient. I've never seen it fail to consume all of the available upstream bandwidth from the client. If this is of interest, I can go dig up the library I used.
How is this better than CloudBerry + AWS S3? Cloudberry allows you to map a drive and makes an AWS S3 bucket look like a standard Windows drive. It also supports multi part upload and supports retrying failed chunks.
It's intuitive, built on top of AWS and you can take advantage of all of the other AWS features -- i.e. send an email notification when a file is received, supports versioning, fine grained access control. as much storage space as your budget will allow etc.
Is there a way to transfer ownership of an S3 bucket? Like, can I upload a file in my AWS account, and by some means transfer the bucket and files to your AWS account? Without the actual data bits moving and incurring a bandwidth charge? That would be ideal for some scenarios.
From the CLI you can copy files to another bucket in another account owned by someone else within the same region without any bandwidth costs. You only pay for the copy requests. The copy request is $.01 per 1000 requests.
You could also create a lambda function that is triggered any time a file is copied with a certain key prefix (like a directory but not really), that copies to the other account. Of course you both have to set up permissions.
If you're working for a company whose business is to transfer massive files, the IT department should be more than willing to pay for CloudBerry. They would need to set up the AWS account and S3 permissions.
It should be a lot easier sell to an IT department an AWS + Cloudberry solution than an unknown company's solution.
My point was that a browser-only solution is more tenable than installing an application in many cases.
I get files from customers in Fortune 100 companies and many of their IT departments absolutely prohibit the installation of something like Cloudberry. It was explained to me that getting IT to approve the installation of a particular application was a multi-month endeavor.
How does this compare against existing services? Wondering since we use box for file sharing at our company, and file upload speed has never really been an issue for us.
It depends on which service you compare us to but you could almost think of us as the WeTransfer for large video sharing. We don't have any file size limits, we retain full folder structures and create dynamic zips designed for each target OS, there are no plugins, it's pay-as-you-go so it fits nicely in with project based businesses, custom branding sets us apart from some of the cloud sharing tools, and the interface is intentionally simple to use. We plan to integrate with video specific tools in the near future as well such as Adobe Premier, Final cut pro, etc.. which should make our positioning clearer. When your dealing with sending terabytes of data month small improvements in performance become much more important to your overall business scalability.
That was my first thought as well. There are tons of file sharing services that already exist, but I think their advantage here is allowing people to send files larger than 20GB. Using DropBox, the limit is 20GB or smaller.
Absolutely we don't try to hide that information you can try out our file transfer calculator here https://www.masv.io/file-transfer-calculator/ - This compares MASV to shipping hard drives and uses real rates from the fedex api. The way I like to think of it is it's much more convenient to transfer online instead of shipping hard drives as there is just less to deal with and it's harder to scale. So if MASV is faster or not much longer then shipping a drive or hand delivering then it's time you can spend on taking on more projects.
You got it. In general you can send as much or as little as you need with MASV because it's pay-as-you-go which means you don't have to manage fixed storage and it scales up and down with your work. If you don't use it the next month it costs nothing.
It seems to be targeted at content creators that routinely have to send very large (~1TB) files. If Masv can do this in a fairly sane way, without issues like "whoops, failed at 99%, try again!" then that's quite a good niche to be in.
Especially since there are plenty of post-production companies that host their own 'customer portal' where files can be uploaded/downloaded, which probably has plenty of security issues and so on. This can then be replaced by a branded Masv page, I guess?
Tried to make sense of one their patents: - patents.google.com/patent/US8548003B2/en
Data transmission units (data units) from the source network are received at an encoding component logically located between the endpoints. These first data units are subdivided into second data units and are transmitted to the destination network over the transport network. Also transmitted are encoded or extra second data units that allow the original first data units to be recreated even if some of the second data units are lost. These encoded second data units may be merely copies of the second data units transmitted, parity second data units, or second data units which have been encoded using erasure correcting coding. At the receiving endpoint, the second data units are received and are used to recreate the original first data units.
Got brain damage. What, ffs, does it mean? Help :)
Sounds like they cache your upload on a server physically near you, and use something like BitTorrent to haul it to a server physically near your destination. Sort of like a CDN for file transfers.
This way they can reduce the number of hops (and potential bottlenecks) between their server and destination, maximizing bandwidth.
Exactly correct we are running on Azure also so it's slightly more expensive. The idea is eventually as we load the service with users our cost will come down and we can pass some of those saving on to our users. We have built out a network that can run on any cloud so in the future we will be able to further optimize cost by finding less expensive routes that have the same level of performance.
AWS/GCP/Azure bandwidth is already incredibly overpriced. You can easily find $0.005/GB and less from other providers, if you’re only interested in server hosting.
For many applications, you can host servers elsewhere while using AWS for S3, DynamoDB, etc.
We wanted to stand up a high performance service to start and took on extra cost to do so. We intend to continuously optimize our system to maintain that performance but access it through more affordable options. Even just being smarter about how we spin up and down servers could reduce costs for us outside of also adding other providers to the mix. Ideally once we have some mass of users we will be able to get better pricing and optimize our cost further which should enable us to pass on savings.
The pricing includes 10 days of storage and the cost of the bandwidth. Today an upload can be downloaded many times but in the future we will be deploying download based billing which will charge you for the data downloaded. We are also working on a pay-as-you-go storage option so you can extend the expiry beyond 10 days and pay some low cent amount per GB for the data stored.
Aspera uses UDP to work around issues sending large files over long distances on big pipes, which TCP usually isn't great for (1). How does MASV handle this over TCP? Do you have some 'tcp accelerators' in between or it is just heavily chunked content?
Yes to the TCP acceleration. Our parent company LiveQoS is a networking technology company. We use our own TCP acceleration software to enable faster downloads across the clouds. Upload acceleration is handled through chunking and by having many servers in major locations globally reducing latency.
Can you talk a little bit about the implementation details?
I assume that it's using WebRTC data channels to upload the file. WebRTC includes STUN / TURN to bypass firewalls so it makes it a great fit.
My guess is that the performance comes from chunking the data and sending it through multiple data channels. The other performance is to bring the receiving client close to the sender to reduce ACK latencies. I don't think that WebRTC allows to develop custom UDP protocols.
Happy to. We are not using WebRTC. Uploads are accomplished by geo-locating the user to one of our 9 global servers closest to them then chunking the data and sending through multiple data channels using javascript in the browser. On the download side when a user initiates a download it geo-locates the downloader and calculates the optimal route across our cloud network. Our network then uses our own in-house TCP acceleration technology to stream the data quickly across the cloud and to push the data fast from our exit edge server to the end users location. We use a combination of parallel connections, TCP acceleration, premium middle mile networks, and reducing latency to maximize performance.
How come we don't have a good P2P file transfer system that everyone can use now? Having a server sit in between just passing it along seems like a waste.
I want as few inbound ports open as possible. I'm assuming so does every person. So the simplest solution is to instead we have 2 outbound connections that talk to a middleman.
Are their better ways, of course, but this is simple to wrap your head around and can be accomplished by anyone with even the most basic technical understanding.
This looks really well designed -- but the performance numbers seem slightly suspicious to me.
Both Dropbox and Aspera do UDP + compression + retry so very curious to see how you are able to get such a drastic improvement -- are you using a special form of compression?
Hey! In terms of maxing out your bandwidth there are a few options. Aspera uses UDP which adds some overhead to the transfers and requires you to install plugins or software to connect both ends. The way we max out your bandwidth is by reducing the latency between you and the server closest to you then using javascript to push multiple TCP flows at the same time which also can effectively fill your pipe. On the download side of things we route the data across premium cloud networks and use TCP acceleration in the cloud to enable fast downloads. TCP acceleration tech only needs to be on the sending device so it requires no plugins on the download side versus UDP. Not having plugins is a big benefit though because there are less firewall concerns and it means you can use our service in more restrictive IT environments.
Just to be clear we are not just using the default Azure setup. We have our own proprietary TCP acceleration technology in use on the Azure network and have 9 servers globally enabling acceleration. This test just shows you regular upload to Azure not a routed upload through our network.
Sorry about that we will look into it right away. The engineers building the web app are much better then my website coding. We will try to reproduce the issue and get it fixed + let you know.
Torrenting requires software and seeders which means your throttled by the seeding parties internet connection. Torrenting can be a good option but you can run into port issues or for restrictive IT policies it can be blocked. ISP's frequently throttle torrents as well. It all depends on what environment you have and the network of the recipient. MASV is intended to require nothing but a browser and enable transfers in a clean, easy to use user interface. In other words it's client-proof which many freelancers or service companies would understand it's not meant to be a replacement for major repeatable workflows that are happening from the same controlled networks it's meant for adhoc deliveries from on-set locations or project based work.
This is an issue I have with one of my employees. They only get consumer Comcast, and the connection is crap. I ended up writing a bash script that will chunk and scp the files to a central server, and if it fails a chunk, it leaves it in the directory for her to manually re-run.
That said, I pay $20/mo for the file server (which i would no matter what) and it has 3TB of outbound bandwidth and unlimited inbound, and is less than 5% of your cost.