Hacker News new | comments | show | ask | jobs | submit login
Show HN: Pushdata.io – Ultra simple time series data storage (pushdata.io)
106 points by rlonn 7 days ago | hide | past | web | favorite | 48 comments





Regarding the free account limit: 1000 data points in time series world is pretty much nothing. What happens if you reach that limit, does it just delete the oldest point? This would give you a month worth of data with one data point every hour or 16 hours at one point per minute. Though at one point per minute, you'd also exceed the rate limit of 1000 API calls per day. So you're stuck with one data point per hour I guess or maybe one per 5 minute interval and 3 days of data retention. I guess one could argue "but it's free!".

There's no way to do any processing of the data like aggregations, min/max, grouping etc. meaning one would have to always fetch the raw data points and then do processing yourself. This kinda works because the amount of datapoints that this supports is so incredibly low. Seeing billions of datapoints over thousands of time series is not that uncommon and you wouldn't be able to use this service for those cases even with the "Business Pro" account for $450/month.

When it comes to serious business users, you should ensure that you don't impose any hard limits on them, instead charge them for the usage. You point out how scalable the service is but you don't allow customers to scale their usage of it. Consider why every cloud vendor (AWS, Google, Azure) charges for example per API call, per amount of objects stored etc. They are happy to take your 10 Petabytes of data and charge you handsomely for it.

Overall it seems like a service for toy projects or projects with very very low amounts of data. But then again you'd be able to store this amount of data in any SQL DB without problem while retaining control of your data, having much more flexibility, getting more speed out of it and being able to do computations on the data in an easier way. This is just too limited, both in terms of amount of data as well as featureset.

I hope this comment doesn't sound too harsh, I intended it as constructive criticism.


The idea was that the service should not compete directly with the "big data" guys. It should be really, really simple and for small-to-medium scale usage (in terms of number of data points stored). But having said that, I am definitely not sure about 1000 data points being a reasonable free level. We will see I guess. Perhaps you're right that a minimum free tier should let people save at least a couple of days worth of data with 1-minute resolution. I don't know, but was afraid that if I set the paywall too high, noone would upgrade. It is easier to raise a paywall (i.e. become more generous) when you realize it is set too low, than the reverse. And the same goes for the business tiers - I have no idea if they're right or not. I just think that the people who do store billions and billions of data points are much better served going to the "big data" guys, or setting up their own, in-house db. They are not likely customers for pushdata.io even if I let them store a ton more data, because they'll have much higher demands on functionality also. I think pushdata.io is for small-to-medium scale usage, and one hope/belief I have is that people and companies often start out small, when they start collecting data, then their needs grow over time. Initially many probably just want to get started, and want a simple storage solution quickly. If they choose pushdata.io they are then likely to upgrade their account until they outgrow the service, which is when they'll move their data somewhere else. I'm fine with that. Many will never reach that point probably, or pushdata will be developed to support larger-scale usage. This is V1.0, after all :)

The system will keep all data that is inserted, but to the user it will appear as a FIFO that deletes the oldest points. If you upgrade your account, however, you will gain access to old data that is stored, that you couldn't see before upgrading.

It's an interesting idea to charge business users for usage instead of forcing them into a tier. The only issue is that you'll then have to provide them with detailed information on their usage, meaning a bit of work making sure things are logged properly and then turned into invoices. But it's definitely an interesting idea.

And I really appreciate the straight feedback. It is the best kind of feedback. Thanks.


> It should be really, really simple and for small-to-medium scale usage (in terms of number of data points stored).

To me those tiers are more like tiny-to-small scale :)

The problem with these small numbers is that it's trivial to put them into really any DB that you can think of. Even SQLite will handle this amount of data with super ease. I can only see people who know nothing about databases using this service. Or someone who for some reason can't use a DB. 1000 datapoints is at tops maybe 100 kilobytes of data if one uses a very unoptimized format and can easily be less than a kilobyte if done properly. At a million datapoints (your highest tier) we're talking about 1 Megabyte to maybe 100 Megabyte (again, if done very poorly). It's really a trivial amount of data. A standard Postgres or MySQL DB will be able to run queries over this dataset in less than a second and that's using something that's totally not optimized for time series.

> It's an interesting idea to charge business users for usage instead of forcing them into a tier. The only issue is that you'll then have to provide them with detailed information on their usage, meaning a bit of work making sure things are logged properly

But you already need to log their usage to know if they fall into whatever tiers limits. And you can store this info in... a time series DB :)


The marketer in me doesn't like the phrase "tiny-to-small scale" It's the same as when you order a latte at Starbucks and they only have various versions of "big" to sell you ;)

I think you're missing the point about self-hosted db's though - people will not choose to put their data on pushdata.io because their own database cannot handle the amount of data they have, but because it is more convenient. Many people, even those who do know a little about databases, do not want to maintain another server [instance] and all that comes with doing that (like backups, for instance). They don't want to set it up, they don't want to maintain or be responsible for it. This choice has nothing to do with performance or size of the data set, but with convenience and peace of mind.

Hey, it's a really cool idea to log usage data in the time series db! That is real dogfooding if I ever saw it. I will definitely try to do that :)


> I think you're missing the point about self-hosted db's though - people will not choose to put their data on pushdata.io because their own database cannot handle the amount of data they have, but because it is more convenient. Many people, even those who do know a little about databases, do not want to maintain another server [instance] and all that comes with doing that (like backups, for instance). They don't want to set it up, they don't want to maintain or be responsible for it. This choice has nothing to do with performance or size of the data set, but with convenience and peace of mind.

I am not so sure. Most projects already need a DB and it's really trivial to store this small amount of data in there as well. I am not sure if it is easier to maintain another account with a third party, write a HTTP POST instead of SQL INSERT and HTTP GET + parsing + processing code instead of SQL SELECT with processing right in the query. One also already needs to do backups of their main data and those backups would just include this timeseries data as well. They most likely don't need another instance, another DB etc. for this amount of data, they just stick it into their existing one.

My point being that yes I agree that people reach for third party services out of convenience and not scalability needs BUT this is such a tiny piece of a bigger puzzle that the overhead it comes with might not be worth it. Removing the need to host the timeseries storage part of a product themselves does not remove the need for a DB for 99.99% of projects. If it is trivial to do something myself, then I will do it myself. But if doing it myself takes considerable effort or requires ongoing maintenance overhead, then I might consider a third party service. The higher the value of your service, the better one can make an argument for oneself to use it. E.g. letting people store much more data, letting them process it etc.

Points pro using a third party service: 1. Too much time to develop 2. Too much maintenance overhead 3. Too difficult (to scale, develop, ...) 4. Economically unfeasable (because no economy of scale benefit, no synergies etc)

Points contra using a third party service: 1. Giving up control over data 2. Legal problems ("where is the data stored?", "am I allowed to hand this data out?") 3. Overhead of interacting with a third party (account maintenance, payments, integration) 4. Inflexibility ("what if I need to change my existing data?", "what if I need some new feature?")

As I see it, the points on "pro" don't apply at this scale. See the tiny scale and featureset is not really about data scale but overall "difficulty to do and keep running".

If I offer a logging service with a REST API to store and retrieve up to 1000 lines of text without any way to filter/search/aggregate/process, would you use that because it's more convenient or would you just log things to a local file or syslog and call it a day?

Like you said it's v1.0 and I'm sure you'll add more stuff, I'm just suggesting that you'll definitely have to add quite a bit of stuff to catch business users. I hope you can make it, I respect everyone who builds a product themselves and tries to make a business out of it. Best of luck!


I'm not sure you have a DB in all cases. The situation that made me think of this service was such a situation: we (startup business) had no easily accessible db where we could place the data for our fledgling BI project. We were using a Google spreadsheet and entering things manually into it. Things such as new subscriptions and cancels each week, etc. Putting that data into the main user db our SaaS product was using meant disturbing our developers who were very busy and also not to keen on contaminating the production db with BI data :) I think this lack of an existing db is common to many small businesses and hobbyists who are collecting (or want to collect) various forms of time series data, from various sources. You may be right that this is not a service that will attract many business users willing to pay for the higher tiers, that it is more suitable for small- (or tiny-) scale use, but that's the beauty of an MVP - you don't spend too much effort on it and you let the users guide development. If it turns out only serious IoT hobbyists buy, and they only buy the "Personal" subscription, then that's the target audience and the tier to focus on.

As for the log file example, sure I would pay someone to store 1000 lines of log data if the data was important to keep. Log files rarely are. But BI data, like new user registrations per day or week is something you never want to lose, and it is not a lot of data. I would pay someone $50/year if it meant I could store some simple BI metrics for my startup, at the resolution I needed, and not have to worry about finding a place in some existing db for it, or set up a new db. Come to think of it, this is another dogfooding opportunity - I have to feed pushdata.io BI metrics into timeseries on pushdata.io! :)

Anyway, I hear you and I thank you for the input. You may be right, we will just have to see. I'll be happy whatever target audience I can find that are willing to pay more than it costs me to serve them.


Hi HN, I built this thing! The idea came a few years back when me and an angel investor in the company I had founded were involved in a BI project, defining and collecting some basic KPIs for the business. At one point when we were thinking about where to store the very limited set of time series data that was our complete set of KPI metrics, the investor said “why isn’t there some service where you can just send a few numeric data points for storage, and get them back, without a lot of hassle?”. I really liked it as I saw it as a very simple idea, easy to implement but which still felt like it may be new/unique.

The new thing here, of course, is not the ability to store time series data. The new/unique thing is the low barrier to entry. The API is extremely small and simple to use and learn, and there is no registration required to start using the service. I tried to make onboarding as frictionless as possible, and I think it’s hard to make it simpler than it is now.

All feedback is greatly appreciated!

Some perhaps useful links:

Site: https://pushdata.io

API docs: https://speca.io/ragnarlonn/pushdata-io

Blog article about making pushdata.io: https://bit.ly/2RGuIxc


It might be a good idea to add more RESTful handling of ingestion. Something similar to the following

   POST /{account}/{series}

   [
      {
        metrics: {
           "a": 10,
           "b": 20,
           "c": [1, 2, 3, 4]
        },
        occurred: "2018-01-01 01:01:01"
      },
      {
        metrics: {
           "a": 20,
           "b": 30,
           "c": [2, 3, 4, 5]
        },
        occurred: "2018-01-01 01:01:02"
      }
   ]

The overhead of an HTTP call of every data point submission might now work well in low bandwith situations. Also in cases where you have intermittent internet connections supplying a timestamp for when an event happens is helpful.

There is an /api/timeseries end point where you can POST multiple data points with one single call. while putting that end point under /{email}/{series} may be more intuitive, I wanted to make it obvious that it is separate, as that end point is a premium feature. See https://speca.io/ragnarlonn/pushdata-io#extended-api

I love the idea but some points:

- Why do one POST a data in URL? I've tried to insert a "_" value and got homepage as HTML with 200 status code. I suggest using HTTP body for that.

- GET requests seem don't respect "Accept: " HTTP header. What if if want to get XML structured response? "Accept: application/xml"

- It's unclear how to sign your requests and is there any authentication. If someone knows my email and/or name of db, can they insert their arbitrary data? Is there any private key or token exchange to verify POST requests?

Edit 1: Formatting


Uhm, that POST should probably return something other than 200. The default is to serve index.html if the server doesn't recognize the URL, but in this case it is pretty obvious that you should get an error code back (400 - Bad Request seems logical).

And yeah, JSON is the only supported encoding right now. I originally had the system output plain text actually, with an option to get JSON instead, but people who saw that thought it was very weird so the plain text support was ropped in a rewrite from Python -> Go. You're right that other common formats, like XML or CSV, should be supported. I'll put it on the issues list.

Authentication: you post a data point to register an account. An account that has just been created has no API key, no security: anyone can access it and post or fetch data to/from it. To secure the account you have to confirm it, which is done by opening the URL you get in your confirmation email. When that URL is accessed, an API key will be created for you and the account is confirmed.

Thinking about it, it might be better to just send out the API key, and then consider the account confirmed once the first request using the API key comes in. Hmm. I didn't want to send the key over email, but it is easy to change it on the site, so maybe that's not a big issue. I'll have to think about that. I understand the current process can be confusing because it is not standard.

Thanks for the feedback!


Quite a few comments praising this service seem to be from users with only 1 comment.

Maybe a mistake to tell everyone I know about it and that I'd make a launch post on HN today... Sweden is awake right now, so probably over-representation by locals but it should correct itself once the US wakes up. I would like some useful criticism.

At least one of them is 4+ years old, and rlonn is a veteran. I'm inclined to give the benefit of the doubt in this case. Bit of a lesson here for "show HN" hopefuls though to not have too many greentext usernames piling on...

Love the idea, it was head-smackingly obvious when I saw your Show HN. Handling this is a PITA in an app and I’d much rather pay for a 3rd party service to handle it.

Some feedback:

- the main benefit is that I don’t need to write or maintain code around storing, retrieving, and hopefully visualizing time series data

- pricing looks fine. Maybe even a little low for Business. But I’d have to try it out to understand if the usage thresholds are too high / low. May want to have a plan between 75 and 450 - that’s a big, painful jump

- May want to think about spending some time editing your copy so it’s crisper. A lot of it can be put into an eg FAQ (why not to use a temp email, for example).

- some examples in other languages would be useful (Ruby, Python, JS, etc)

- I like that I can create an account with just a curl POST. Good stuff.

- one concern is how fast is retrieval? If I’m retrieving data in a request thread I want it to be FAST. One factor here is server performance. If you’re using eg AWS to host that would give me some confidence that you can scale up beyond a hamster wheel if necessary

- no way to try this out on mobile, which is I suspect where a lot of people will first see this. How can you remind them to come back when they’re at a desktop? Collect an email and remind them maybe?

Haven’t tried the service yet but assuming it works as advertised it’s a good start.

Congratulations!


This is fantastic feedback - thanks!

The thresholds are pretty much guesswork - they're probably a bit off but I'll adjust them when I know more about how people use the service. One thing that might not be immediately obvious is the amount of data you can store at the different price points. The number of time series and the number of points per series go up, which means a lot more potential data storage each time you go up a level. More storage means more transfers, and so I set the levels and pricing based on worst-case calculations for AWS bandwidth usage costs.

And yes, it's hosted on AWS at the moment. Virginia. It should be reasonable fast for US-based users, but if it turns out to be viable I might look at distributing it more. As for scalability it is designed to be scalable, but currently there is a bottleneck in the form of a lookup database that is hit for each request for timeseries data. If I add caching of those lookups, it should scale well horizontally - I just add more API servers and timeseries databases

I'll definitely be adding code examples. Perhaps also documentation/howtos for interfacing via IFTTT, Zapier etc?

Oh, and the copy etc - I wanted to minimize the number of pages, so I put all the most useful info on the front page :) I definitely agree that e.g. an FAQ section would be nice.

Thanks again for some solid feedback!


This is very interesting, good job. I had made something a bit simpler back in the day, it's a single server binary you can run (and I run a demo server you can use) but I think yours has a few more features.

Maybe you can use it for inspiration, it's called Gweet:

https://github.com/skorokithakis/gweet


Ah, and built in Go also. I'll check it out :)

That is actually pretty neat.

I have been looking for a service exactly like this for storing/retrieving data for Siri Shortcuts! Integrations with Firebase, Google Docs, etc are too difficult, but this API is very easily supported by Shortcuts. In fact without needing a user to register, you've opened up a huge group of users who can just download a shortcut and use it without ever needing to touch your service.

Fantastic job with this, I can't wait to try it


That sounds like an interesting use case. If you want, I'll give you a free premium subscription if I can do a case study on your problem and the solution.

Yes absolutely! How should I contact you?

Send me something on hello at pushdata. io and I'll respond from my regular email.

It says that an account is automatically created when you push to a url with an email not seen before. Then it goes on to say that there's no security, its not recommended etc...

Why not make sure the email address has been validated before allowing it to be automatically created?

Because right now someone could spam the sites servers with millions of unverified emails...


There are rate limits in place to prevent email spamming. The system stops creating new accounts for a domain where the percentage of unconfirmed accounts is too high. There is also rate limiting on client IP, disallowing client IPs to create lots of unconfirmed accounts. I am also planning to, but haven't got around to it yet, add exponential backoff to email sendouts, so that an increase in the sendout rate causes sendouts to be slower and slower (until manual intervention to increase limits).

So, in short: yes, the system could be used for email spamming, but not very effectively. There are better options if all you want to do is spam people.


my feedback. Excellent work.

I hope you make sure that you get high in google ranking. A while back (about a year ago) I had the need for something like this and kept searching on google for such a service.

Surely somebody must have written it, I thought.

Well, I ended up accepting that either it did not exist or if it existed it was impossible to find (which is the same thing from my angle).

So again hope people find you once the hackernews show traffic fades away...

cheers


Yeah, we'll see. I'd need inlinks and without a ton of publicly accessible content, that can be hard to get. I'm thinking maybe IoT communities would be interested and may link to the site, as the product could be quite useful to hobby IoT hackers with little or no budget for data storage, and not much interest in being db maintainers.

I've noticed with a lot of these "get started in seconds" services, you inevitably run into challenges down the road when the complications of trying to use them for a project come full swing. Not to mention the shaky nature of cloud hosting with a company that could easily be out of business in 6 months.

I agree, but often the alternative to using a service that offers painless onboarding, is to not do anything. I.e. you want to do X, and you're trying to find out how. You find product/service/component A, B and C that offers frictionless onboarding, and it allows you to get started on X. If, on the other hand, you find A, B and C that all require you to spend a couple of days learning their APIs etc, then you may never get started on X because the task becomes too big. There is a lot to be said for getting started on something. Components can always be replaced.

This reminds me a lot of another product called Metric Board: https://metricboard.io/how-does-it-work

That's cool, I hadn't seen that one. It's very similar, but with more functionality and seems to require signup (you have to have an API key to store data points)

Maybe I am missing something..

I am curious how this is different from multiple other hosted graphite/grafana solutions. All of them allow for time-series to be stored and retrieved/graphed.


pushdata.io is simpler. You don't need to set up an account, the API is very, very small and simple, and the visualization is also much simpler than using Grafana. It is meant to be as frictionless as possible to get started with.

Wait so I can post data using any email with no security by default? That seems like a really bad idea. I get you’re trying to be clever but damn. Can’t possibly see that causing a problem.

What is the threat scenario you see?

Spamming the server with millions of unique but unverified email addresses, it will store anything you throw at it...

I wrote about that just now: there are rate limits in place to prevent it. You'd have to go to quite a bit of trouble to be able to use the system for any kind of effective spamming, coming from different unique client IPs and only send a few emails from each of them, and not send too many emails to the same domains (e.g. spamming gmail.com would be impossible).

This will be immediately useful for all of my little projects. Looking forward to metrics-without-headache. Thanks!

I liked the part about replacing the React code. Small is beautiful.

Sometimes I worry that I'm too old-school, doing things like throwing out frameworks everyone else is using, and I see young(er) developers being incredibly productive when using those frameworks plus components, libraries, etc but then I also see that a lot of what they create looks the same and works the same, contains the same bugs, because (IMO) they're sometimes over-using external code and using cannons to shoot flies. I'm probably a bit too much the other way around, but it seems to work for me. I tend to be quite focused on minimizing external dependencies and only use what is absolutely necessary.

...looking incredibly productive rather than being.

Complexity always come back and get you. All this frameworks focus on throwing out code in weeks and make stuff impossible to maintain in few years - that's the tradeoff.

It's no surprise FAANG and especially Amazon focus so much on simplifying their infra and write stuff from scratch so often.


I absolutely agree! Built a 6-figure-MRR SaaS with having exactly this attitude.

My attitude, or the other one? :)

Nice idea. Easy to plugin and use with IOT devices.

Really cool!! I'm looking forward to use this on my next project

Looks marvellous!

I like that I get a finished cURL example right away - so easy to start using. :)

Seems like a fantastic service!



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: