There's no way to do any processing of the data like aggregations, min/max, grouping etc. meaning one would have to always fetch the raw data points and then do processing yourself. This kinda works because the amount of datapoints that this supports is so incredibly low. Seeing billions of datapoints over thousands of time series is not that uncommon and you wouldn't be able to use this service for those cases even with the "Business Pro" account for $450/month.
When it comes to serious business users, you should ensure that you don't impose any hard limits on them, instead charge them for the usage. You point out how scalable the service is but you don't allow customers to scale their usage of it. Consider why every cloud vendor (AWS, Google, Azure) charges for example per API call, per amount of objects stored etc. They are happy to take your 10 Petabytes of data and charge you handsomely for it.
Overall it seems like a service for toy projects or projects with very very low amounts of data. But then again you'd be able to store this amount of data in any SQL DB without problem while retaining control of your data, having much more flexibility, getting more speed out of it and being able to do computations on the data in an easier way. This is just too limited, both in terms of amount of data as well as featureset.
I hope this comment doesn't sound too harsh, I intended it as constructive criticism.
The system will keep all data that is inserted, but to the user it will appear as a FIFO that deletes the oldest points. If you upgrade your account, however, you will gain access to old data that is stored, that you couldn't see before upgrading.
It's an interesting idea to charge business users for usage instead of forcing them into a tier. The only issue is that you'll then have to provide them with detailed information on their usage, meaning a bit of work making sure things are logged properly and then turned into invoices. But it's definitely an interesting idea.
And I really appreciate the straight feedback. It is the best kind of feedback. Thanks.
To me those tiers are more like tiny-to-small scale :)
The problem with these small numbers is that it's trivial to put them into really any DB that you can think of. Even SQLite will handle this amount of data with super ease. I can only see people who know nothing about databases using this service. Or someone who for some reason can't use a DB. 1000 datapoints is at tops maybe 100 kilobytes of data if one uses a very unoptimized format and can easily be less than a kilobyte if done properly. At a million datapoints (your highest tier) we're talking about 1 Megabyte to maybe 100 Megabyte (again, if done very poorly). It's really a trivial amount of data. A standard Postgres or MySQL DB will be able to run queries over this dataset in less than a second and that's using something that's totally not optimized for time series.
> It's an interesting idea to charge business users for usage instead of forcing them into a tier. The only issue is that you'll then have to provide them with detailed information on their usage, meaning a bit of work making sure things are logged properly
But you already need to log their usage to know if they fall into whatever tiers limits. And you can store this info in... a time series DB :)
I think you're missing the point about self-hosted db's though - people will not choose to put their data on pushdata.io because their own database cannot handle the amount of data they have, but because it is more convenient. Many people, even those who do know a little about databases, do not want to maintain another server [instance] and all that comes with doing that (like backups, for instance). They don't want to set it up, they don't want to maintain or be responsible for it. This choice has nothing to do with performance or size of the data set, but with convenience and peace of mind.
Hey, it's a really cool idea to log usage data in the time series db! That is real dogfooding if I ever saw it. I will definitely try to do that :)
I am not so sure. Most projects already need a DB and it's really trivial to store this small amount of data in there as well. I am not sure if it is easier to maintain another account with a third party, write a HTTP POST instead of SQL INSERT and HTTP GET + parsing + processing code instead of SQL SELECT with processing right in the query. One also already needs to do backups of their main data and those backups would just include this timeseries data as well. They most likely don't need another instance, another DB etc. for this amount of data, they just stick it into their existing one.
My point being that yes I agree that people reach for third party services out of convenience and not scalability needs BUT this is such a tiny piece of a bigger puzzle that the overhead it comes with might not be worth it. Removing the need to host the timeseries storage part of a product themselves does not remove the need for a DB for 99.99% of projects. If it is trivial to do something myself, then I will do it myself. But if doing it myself takes considerable effort or requires ongoing maintenance overhead, then I might consider a third party service. The higher the value of your service, the better one can make an argument for oneself to use it. E.g. letting people store much more data, letting them process it etc.
Points pro using a third party service:
1. Too much time to develop
2. Too much maintenance overhead
3. Too difficult (to scale, develop, ...)
4. Economically unfeasable (because no economy of scale benefit, no synergies etc)
Points contra using a third party service:
1. Giving up control over data
2. Legal problems ("where is the data stored?", "am I allowed to hand this data out?")
3. Overhead of interacting with a third party (account maintenance, payments, integration)
4. Inflexibility ("what if I need to change my existing data?", "what if I need some new feature?")
As I see it, the points on "pro" don't apply at this scale. See the tiny scale and featureset is not really about data scale but overall "difficulty to do and keep running".
If I offer a logging service with a REST API to store and retrieve up to 1000 lines of text without any way to filter/search/aggregate/process, would you use that because it's more convenient or would you just log things to a local file or syslog and call it a day?
Like you said it's v1.0 and I'm sure you'll add more stuff, I'm just suggesting that you'll definitely have to add quite a bit of stuff to catch business users. I hope you can make it, I respect everyone who builds a product themselves and tries to make a business out of it. Best of luck!
As for the log file example, sure I would pay someone to store 1000 lines of log data if the data was important to keep. Log files rarely are. But BI data, like new user registrations per day or week is something you never want to lose, and it is not a lot of data. I would pay someone $50/year if it meant I could store some simple BI metrics for my startup, at the resolution I needed, and not have to worry about finding a place in some existing db for it, or set up a new db. Come to think of it, this is another dogfooding opportunity - I have to feed pushdata.io BI metrics into timeseries on pushdata.io! :)
Anyway, I hear you and I thank you for the input. You may be right, we will just have to see. I'll be happy whatever target audience I can find that are willing to pay more than it costs me to serve them.
The new thing here, of course, is not the ability to store time series data. The new/unique thing is the low barrier to entry. The API is extremely small and simple to use and learn, and there is no registration required to start using the service. I tried to make onboarding as frictionless as possible, and I think it’s hard to make it simpler than it is now.
All feedback is greatly appreciated!
Some perhaps useful links:
API docs: https://speca.io/ragnarlonn/pushdata-io
Blog article about making pushdata.io: https://bit.ly/2RGuIxc
"c": [1, 2, 3, 4]
occurred: "2018-01-01 01:01:01"
"c": [2, 3, 4, 5]
occurred: "2018-01-01 01:01:02"
- Why do one POST a data in URL? I've tried to insert a "_" value and got homepage as HTML with 200 status code. I suggest using HTTP body for that.
- GET requests seem don't respect "Accept: " HTTP header. What if if want to get XML structured response? "Accept: application/xml"
- It's unclear how to sign your requests and is there any authentication. If someone knows my email and/or name of db, can they insert their arbitrary data? Is there any private key or token exchange to verify POST requests?
Edit 1: Formatting
And yeah, JSON is the only supported encoding right now. I originally had the system output plain text actually, with an option to get JSON instead, but people who saw that thought it was very weird so the plain text support was ropped in a rewrite from Python -> Go. You're right that other common formats, like XML or CSV, should be supported. I'll put it on the issues list.
Authentication: you post a data point to register an account. An account that has just been created has no API key, no security: anyone can access it and post or fetch data to/from it. To secure the account you have to confirm it, which is done by opening the URL you get in your confirmation email. When that URL is accessed, an API key will be created for you and the account is confirmed.
Thinking about it, it might be better to just send out the API key, and then consider the account confirmed once the first request using the API key comes in. Hmm. I didn't want to send the key over email, but it is easy to change it on the site, so maybe that's not a big issue. I'll have to think about that. I understand the current process can be confusing because it is not standard.
Thanks for the feedback!
- the main benefit is that I don’t need to write or maintain code around storing, retrieving, and hopefully visualizing time series data
- pricing looks fine. Maybe even a little low for Business. But I’d have to try it out to understand if the usage thresholds are too high / low. May want to have a plan between 75 and 450 - that’s a big, painful jump
- May want to think about spending some time editing your copy so it’s crisper. A lot of it can be put into an eg FAQ (why not to use a temp email, for example).
- some examples in other languages would be useful (Ruby, Python, JS, etc)
- I like that I can create an account with just a curl POST. Good stuff.
- one concern is how fast is retrieval? If I’m retrieving data in a request thread I want it to be FAST. One factor here is server performance. If you’re using eg AWS to host that would give me some confidence that you can scale up beyond a hamster wheel if necessary
- no way to try this out on mobile, which is I suspect where a lot of people will first see this. How can you remind them to come back when they’re at a desktop? Collect an email and remind them maybe?
Haven’t tried the service yet but assuming it works as advertised it’s a good start.
The thresholds are pretty much guesswork - they're probably a bit off but I'll adjust them when I know more about how people use the service. One thing that might not be immediately obvious is the amount of data you can store at the different price points. The number of time series and the number of points per series go up, which means a lot more potential data storage each time you go up a level. More storage means more transfers, and so I set the levels and pricing based on worst-case calculations for AWS bandwidth usage costs.
And yes, it's hosted on AWS at the moment. Virginia. It should be reasonable fast for US-based users, but if it turns out to be viable I might look at distributing it more. As for scalability it is designed to be scalable, but currently there is a bottleneck in the form of a lookup database that is hit for each request for timeseries data. If I add caching of those lookups, it should scale well horizontally - I just add more API servers and timeseries databases
I'll definitely be adding code examples. Perhaps also documentation/howtos for interfacing via IFTTT, Zapier etc?
Oh, and the copy etc - I wanted to minimize the number of pages, so I put all the most useful info on the front page :) I definitely agree that e.g. an FAQ section would be nice.
Thanks again for some solid feedback!
Maybe you can use it for inspiration, it's called Gweet:
Fantastic job with this, I can't wait to try it
Why not make sure the email address has been validated before allowing it to be automatically created?
Because right now someone could spam the sites servers with millions of unverified emails...
So, in short: yes, the system could be used for email spamming, but not very effectively. There are better options if all you want to do is spam people.
I hope you make sure that you get high in google ranking.
A while back (about a year ago) I had the need for something like this and kept searching on google for such a service.
Surely somebody must have written it, I thought.
Well, I ended up accepting that either it did not exist or if it existed it was impossible to find (which is the same thing from my angle).
So again hope people find you once the hackernews show traffic fades away...
I am curious how this is different from multiple other hosted graphite/grafana solutions. All of them allow for time-series to be stored and retrieved/graphed.
Complexity always come back and get you. All this frameworks focus on throwing out code in weeks and make stuff impossible to maintain in few years - that's the tradeoff.
It's no surprise FAANG and especially Amazon focus so much on simplifying their infra and write stuff from scratch so often.