
Show HN: Pushdata.io – Ultra simple time series data storage - rlonn
https://pushdata.io
======
eis
Regarding the free account limit: 1000 data points in time series world is
pretty much nothing. What happens if you reach that limit, does it just delete
the oldest point? This would give you a month worth of data with one data
point every hour or 16 hours at one point per minute. Though at one point per
minute, you'd also exceed the rate limit of 1000 API calls per day. So you're
stuck with one data point per hour I guess or maybe one per 5 minute interval
and 3 days of data retention. I guess one could argue "but it's free!".

There's no way to do any processing of the data like aggregations, min/max,
grouping etc. meaning one would have to always fetch the raw data points and
then do processing yourself. This kinda works because the amount of datapoints
that this supports is so incredibly low. Seeing billions of datapoints over
thousands of time series is not that uncommon and you wouldn't be able to use
this service for those cases even with the "Business Pro" account for
$450/month.

When it comes to serious business users, you should ensure that you don't
impose any hard limits on them, instead charge them for the usage. You point
out how scalable the service is but you don't allow customers to scale their
usage of it. Consider why every cloud vendor (AWS, Google, Azure) charges for
example per API call, per amount of objects stored etc. They are happy to take
your 10 Petabytes of data and charge you handsomely for it.

Overall it seems like a service for toy projects or projects with very very
low amounts of data. But then again you'd be able to store this amount of data
in any SQL DB without problem while retaining control of your data, having
much more flexibility, getting more speed out of it and being able to do
computations on the data in an easier way. This is just too limited, both in
terms of amount of data as well as featureset.

I hope this comment doesn't sound too harsh, I intended it as constructive
criticism.

~~~
rlonn
The idea was that the service should not compete directly with the "big data"
guys. It should be really, really simple and for small-to-medium scale usage
(in terms of number of data points stored). But having said that, I am
definitely not sure about 1000 data points being a reasonable free level. We
will see I guess. Perhaps you're right that a minimum free tier should let
people save at least a couple of days worth of data with 1-minute resolution.
I don't know, but was afraid that if I set the paywall too high, noone would
upgrade. It is easier to raise a paywall (i.e. become more generous) when you
realize it is set too low, than the reverse. And the same goes for the
business tiers - I have no idea if they're right or not. I just think that the
people who do store billions and billions of data points are much better
served going to the "big data" guys, or setting up their own, in-house db.
They are not likely customers for pushdata.io even if I let them store a ton
more data, because they'll have much higher demands on functionality also. I
think pushdata.io is for small-to-medium scale usage, and one hope/belief I
have is that people and companies often start out small, when they start
collecting data, then their needs grow over time. Initially many probably just
want to get started, and want a simple storage solution quickly. If they
choose pushdata.io they are then likely to upgrade their account until they
outgrow the service, which is when they'll move their data somewhere else. I'm
fine with that. Many will never reach that point probably, or pushdata will be
developed to support larger-scale usage. This is V1.0, after all :)

The system will keep _all_ data that is inserted, but to the user it will
appear as a FIFO that deletes the oldest points. If you upgrade your account,
however, you will gain access to old data that is stored, that you couldn't
see before upgrading.

It's an interesting idea to charge business users for usage instead of forcing
them into a tier. The only issue is that you'll then have to provide them with
detailed information on their usage, meaning a bit of work making sure things
are logged properly and then turned into invoices. But it's definitely an
interesting idea.

And I really appreciate the straight feedback. It is the best kind of
feedback. Thanks.

~~~
eis
> It should be really, really simple and for small-to-medium scale usage (in
> terms of number of data points stored).

To me those tiers are more like tiny-to-small scale :)

The problem with these small numbers is that it's trivial to put them into
really any DB that you can think of. Even SQLite will handle this amount of
data with super ease. I can only see people who know nothing about databases
using this service. Or someone who for some reason can't use a DB. 1000
datapoints is at tops maybe 100 kilobytes of data if one uses a very
unoptimized format and can easily be less than a kilobyte if done properly. At
a million datapoints (your highest tier) we're talking about 1 Megabyte to
maybe 100 Megabyte (again, if done very poorly). It's really a trivial amount
of data. A standard Postgres or MySQL DB will be able to run queries over this
dataset in less than a second and that's using something that's totally not
optimized for time series.

> It's an interesting idea to charge business users for usage instead of
> forcing them into a tier. The only issue is that you'll then have to provide
> them with detailed information on their usage, meaning a bit of work making
> sure things are logged properly

But you already need to log their usage to know if they fall into whatever
tiers limits. And you can store this info in... a time series DB :)

~~~
rlonn
The marketer in me doesn't like the phrase "tiny-to-small scale" It's the same
as when you order a latte at Starbucks and they only have various versions of
"big" to sell you ;)

I think you're missing the point about self-hosted db's though - people will
not choose to put their data on pushdata.io because their own database cannot
handle the amount of data they have, but because it is more convenient. Many
people, even those who _do_ know a little about databases, do not want to
maintain another server [instance] and all that comes with doing that (like
backups, for instance). They don't want to set it up, they don't want to
maintain or be responsible for it. This choice has nothing to do with
performance or size of the data set, but with convenience and peace of mind.

Hey, it's a really cool idea to log usage data in the time series db! That is
_real_ dogfooding if I ever saw it. I will definitely try to do that :)

~~~
eis
> I think you're missing the point about self-hosted db's though - people will
> not choose to put their data on pushdata.io because their own database
> cannot handle the amount of data they have, but because it is more
> convenient. Many people, even those who do know a little about databases, do
> not want to maintain another server [instance] and all that comes with doing
> that (like backups, for instance). They don't want to set it up, they don't
> want to maintain or be responsible for it. This choice has nothing to do
> with performance or size of the data set, but with convenience and peace of
> mind.

I am not so sure. Most projects already need a DB and it's really trivial to
store this small amount of data in there as well. I am not sure if it is
easier to maintain another account with a third party, write a HTTP POST
instead of SQL INSERT and HTTP GET + parsing + processing code instead of SQL
SELECT with processing right in the query. One also already needs to do
backups of their main data and those backups would just include this
timeseries data as well. They most likely don't need another instance, another
DB etc. for this amount of data, they just stick it into their existing one.

My point being that yes I agree that people reach for third party services out
of convenience and not scalability needs BUT this is such a tiny piece of a
bigger puzzle that the overhead it comes with might not be worth it. Removing
the need to host the timeseries storage part of a product themselves does not
remove the need for a DB for 99.99% of projects. If it is trivial to do
something myself, then I will do it myself. But if doing it myself takes
considerable effort or requires ongoing maintenance overhead, then I might
consider a third party service. The higher the value of your service, the
better one can make an argument for oneself to use it. E.g. letting people
store much more data, letting them process it etc.

Points pro using a third party service: 1\. Too much time to develop 2\. Too
much maintenance overhead 3\. Too difficult (to scale, develop, ...) 4\.
Economically unfeasable (because no economy of scale benefit, no synergies
etc)

Points contra using a third party service: 1\. Giving up control over data 2\.
Legal problems ("where is the data stored?", "am I allowed to hand this data
out?") 3\. Overhead of interacting with a third party (account maintenance,
payments, integration) 4\. Inflexibility ("what if I need to change my
existing data?", "what if I need some new feature?")

As I see it, the points on "pro" don't apply at this scale. See the tiny scale
and featureset is not really about data scale but overall "difficulty to do
and keep running".

If I offer a logging service with a REST API to store and retrieve up to 1000
lines of text without any way to filter/search/aggregate/process, would you
use that because it's more convenient or would you just log things to a local
file or syslog and call it a day?

Like you said it's v1.0 and I'm sure you'll add more stuff, I'm just
suggesting that you'll definitely have to add quite a bit of stuff to catch
business users. I hope you can make it, I respect everyone who builds a
product themselves and tries to make a business out of it. Best of luck!

~~~
rlonn
I'm not sure you have a DB in all cases. The situation that made me think of
this service was such a situation: we (startup business) had no easily
accessible db where we could place the data for our fledgling BI project. We
were using a Google spreadsheet and entering things manually into it. Things
such as new subscriptions and cancels each week, etc. Putting that data into
the main user db our SaaS product was using meant disturbing our developers
who were very busy and also not to keen on contaminating the production db
with BI data :) I think this lack of an existing db is common to many small
businesses and hobbyists who are collecting (or want to collect) various forms
of time series data, from various sources. You may be right that this is not a
service that will attract many business users willing to pay for the higher
tiers, that it is more suitable for small- (or tiny-) scale use, but that's
the beauty of an MVP - you don't spend too much effort on it and you let the
users guide development. If it turns out only serious IoT hobbyists buy, and
they only buy the "Personal" subscription, then that's the target audience and
the tier to focus on.

As for the log file example, sure I would pay someone to store 1000 lines of
log data _if the data was important to keep_. Log files rarely are. But BI
data, like new user registrations per day or week is something you never want
to lose, and it is not a lot of data. I would pay someone $50/year if it meant
I could store some simple BI metrics for my startup, at the resolution I
needed, and not have to worry about finding a place in some existing db for
it, or set up a new db. Come to think of it, this is another dogfooding
opportunity - I have to feed pushdata.io BI metrics into timeseries on
pushdata.io! :)

Anyway, I hear you and I thank you for the input. You may be right, we will
just have to see. I'll be happy whatever target audience I can find that are
willing to pay more than it costs me to serve them.

------
rlonn
Hi HN, I built this thing! The idea came a few years back when me and an angel
investor in the company I had founded were involved in a BI project, defining
and collecting some basic KPIs for the business. At one point when we were
thinking about where to store the very limited set of time series data that
was our complete set of KPI metrics, the investor said “why isn’t there some
service where you can just send a few numeric data points for storage, and get
them back, without a lot of hassle?”. I really liked it as I saw it as a very
simple idea, easy to implement but which still felt like it may be new/unique.

The new thing here, of course, is not the ability to store time series data.
The new/unique thing is the low barrier to entry. The API is extremely small
and simple to use and learn, and there is no registration required to start
using the service. I tried to make onboarding as frictionless as possible, and
I think it’s hard to make it simpler than it is now.

All feedback is greatly appreciated!

Some perhaps useful links:

Site: [https://pushdata.io](https://pushdata.io)

API docs: [https://speca.io/ragnarlonn/pushdata-
io](https://speca.io/ragnarlonn/pushdata-io)

Blog article about making pushdata.io:
[https://bit.ly/2RGuIxc](https://bit.ly/2RGuIxc)

~~~
gravypod
It might be a good idea to add more RESTful handling of ingestion. Something
similar to the following

    
    
       POST /{account}/{series}
    
       [
          {
            metrics: {
               "a": 10,
               "b": 20,
               "c": [1, 2, 3, 4]
            },
            occurred: "2018-01-01 01:01:01"
          },
          {
            metrics: {
               "a": 20,
               "b": 30,
               "c": [2, 3, 4, 5]
            },
            occurred: "2018-01-01 01:01:02"
          }
       ]
    
    

The overhead of an HTTP call of every data point submission might now work
well in low bandwith situations. Also in cases where you have intermittent
internet connections supplying a timestamp for when an event happens is
helpful.

~~~
rlonn
There is an /api/timeseries end point where you can POST multiple data points
with one single call. while putting that end point under /{email}/{series} may
be more intuitive, I wanted to make it obvious that it is separate, as that
end point is a premium feature. See [https://speca.io/ragnarlonn/pushdata-
io#extended-api](https://speca.io/ragnarlonn/pushdata-io#extended-api)

------
GRiMe2D
I love the idea but some points:

\- Why do one POST a data in URL? I've tried to insert a "_" value and got
homepage as HTML with 200 status code. I suggest using HTTP body for that.

\- GET requests seem don't respect "Accept: " HTTP header. What if if want to
get XML structured response? "Accept: application/xml"

\- It's unclear how to sign your requests and is there any authentication. If
someone knows my email and/or name of db, can they insert their arbitrary
data? Is there any private key or token exchange to verify POST requests?

Edit 1: Formatting

~~~
rlonn
Uhm, that POST should probably return something other than 200. The default is
to serve index.html if the server doesn't recognize the URL, but in this case
it is pretty obvious that you should get an error code back (400 - Bad Request
seems logical).

And yeah, JSON is the _only_ supported encoding right now. I originally had
the system output plain text actually, with an option to get JSON instead, but
people who saw that thought it was very weird so the plain text support was
ropped in a rewrite from Python -> Go. You're right that other common formats,
like XML or CSV, should be supported. I'll put it on the issues list.

Authentication: you post a data point to register an account. An account that
has just been created has no API key, no security: anyone can access it and
post or fetch data to/from it. To secure the account you have to confirm it,
which is done by opening the URL you get in your confirmation email. When that
URL is accessed, an API key will be created for you and the account is
confirmed.

Thinking about it, it might be better to just send out the API key, and then
consider the account confirmed once the first request using the API key comes
in. Hmm. I didn't want to send the key over email, but it is easy to change it
on the site, so maybe that's not a big issue. I'll have to think about that. I
understand the current process can be confusing because it is not standard.

Thanks for the feedback!

------
snicker7
Quite a few comments praising this service seem to be from users with only 1
comment.

~~~
rlonn
Maybe a mistake to tell everyone I know about it and that I'd make a launch
post on HN today... Sweden is awake right now, so probably over-representation
by locals but it should correct itself once the US wakes up. I _would_ like
some useful criticism.

------
kareemm
Love the idea, it was head-smackingly obvious when I saw your Show HN.
Handling this is a PITA in an app and I’d much rather pay for a 3rd party
service to handle it.

Some feedback:

\- the main benefit is that I don’t need to write or maintain code around
storing, retrieving, and hopefully visualizing time series data

\- pricing looks fine. Maybe even a little low for Business. But I’d have to
try it out to understand if the usage thresholds are too high / low. May want
to have a plan between 75 and 450 - that’s a big, painful jump

\- May want to think about spending some time editing your copy so it’s
crisper. A lot of it can be put into an eg FAQ (why not to use a temp email,
for example).

\- some examples in other languages would be useful (Ruby, Python, JS, etc)

\- I like that I can create an account with just a curl POST. Good stuff.

\- one concern is how fast is retrieval? If I’m retrieving data in a request
thread I want it to be FAST. One factor here is server performance. If you’re
using eg AWS to host that would give me some confidence that you can scale up
beyond a hamster wheel if necessary

\- no way to try this out on mobile, which is I suspect where a lot of people
will first see this. How can you remind them to come back when they’re at a
desktop? Collect an email and remind them maybe?

Haven’t tried the service yet but assuming it works as advertised it’s a good
start.

Congratulations!

~~~
rlonn
This is fantastic feedback - thanks!

The thresholds are pretty much guesswork - they're probably a bit off but I'll
adjust them when I know more about how people use the service. One thing that
might not be immediately obvious is the amount of data you can store at the
different price points. The number of time series _and_ the number of points
per series go up, which means a lot more potential data storage each time you
go up a level. More storage means more transfers, and so I set the levels and
pricing based on worst-case calculations for AWS bandwidth usage costs.

And yes, it's hosted on AWS at the moment. Virginia. It should be reasonable
fast for US-based users, but if it turns out to be viable I might look at
distributing it more. As for scalability it is designed to be scalable, but
currently there is a bottleneck in the form of a lookup database that is hit
for each request for timeseries data. If I add caching of those lookups, it
should scale well horizontally - I just add more API servers and timeseries
databases

I'll definitely be adding code examples. Perhaps also documentation/howtos for
interfacing via IFTTT, Zapier etc?

Oh, and the copy etc - I wanted to minimize the number of pages, so I put all
the most useful info on the front page :) I definitely agree that e.g. an FAQ
section would be nice.

Thanks again for some solid feedback!

------
StavrosK
This is very interesting, good job. I had made something a bit simpler back in
the day, it's a single server binary you can run (and I run a demo server you
can use) but I think yours has a few more features.

Maybe you can use it for inspiration, it's called Gweet:

[https://github.com/skorokithakis/gweet](https://github.com/skorokithakis/gweet)

~~~
rlonn
Ah, and built in Go also. I'll check it out :)

------
fudged71
I have been looking for a service exactly like this for storing/retrieving
data for Siri Shortcuts! Integrations with Firebase, Google Docs, etc are too
difficult, but this API is very easily supported by Shortcuts. In fact without
needing a user to register, you've opened up a huge group of users who can
just download a shortcut and use it without ever needing to touch your
service.

Fantastic job with this, I can't wait to try it

~~~
rlonn
That sounds like an interesting use case. If you want, I'll give you a free
premium subscription if I can do a case study on your problem and the
solution.

~~~
fudged71
Yes absolutely! How should I contact you?

~~~
rlonn
Send me something on hello at pushdata. io and I'll respond from my regular
email.

------
lozzo
my feedback. Excellent work.

I hope you make sure that you get high in google ranking. A while back (about
a year ago) I had the need for something like this and kept searching on
google for such a service.

Surely somebody must have written it, I thought.

Well, I ended up accepting that either it did not exist or if it existed it
was impossible to find (which is the same thing from my angle).

So again hope people find you once the hackernews show traffic fades away...

cheers

~~~
rlonn
Yeah, we'll see. I'd need inlinks and without a ton of publicly accessible
content, that can be hard to get. I'm thinking maybe IoT communities would be
interested and may link to the site, as the product could be quite useful to
hobby IoT hackers with little or no budget for data storage, and not much
interest in being db maintainers.

------
gitgud
It says that an account is automatically created when you push to a url with
an _email_ not seen before. Then it goes on to say that there's no security,
its not recommended etc...

Why not make sure the email address has been validated before allowing it to
be automatically created?

Because right now someone could spam the sites servers with millions of
unverified emails...

~~~
rlonn
There are rate limits in place to prevent email spamming. The system stops
creating new accounts for a domain where the percentage of unconfirmed
accounts is too high. There is also rate limiting on client IP, disallowing
client IPs to create lots of unconfirmed accounts. I am also planning to, but
haven't got around to it yet, add exponential backoff to email sendouts, so
that an increase in the sendout rate causes sendouts to be slower and slower
(until manual intervention to increase limits).

So, in short: yes, the system could be used for email spamming, but not very
effectively. There are better options if all you want to do is spam people.

------
jpmoyn
I've noticed with a lot of these "get started in seconds" services, you
inevitably run into challenges down the road when the complications of trying
to use them for a project come full swing. Not to mention the shaky nature of
cloud hosting with a company that could easily be out of business in 6 months.

~~~
rlonn
I agree, but often the alternative to using a service that offers painless
onboarding, is to not do anything. I.e. you want to do X, and you're trying to
find out how. You find product/service/component A, B and C that offers
frictionless onboarding, and it allows you to get started on X. If, on the
other hand, you find A, B and C that all require you to spend a couple of days
learning their APIs etc, then you may never get started on X because the task
becomes too big. There is a lot to be said for getting started on something.
Components can always be replaced.

------
jermaustin1
This reminds me a lot of another product called Metric Board:
[https://metricboard.io/how-does-it-work](https://metricboard.io/how-does-it-
work)

~~~
rlonn
That's cool, I hadn't seen that one. It's very similar, but with more
functionality and seems to require signup (you have to have an API key to
store data points)

------
dawnerd
Wait so I can post data using any email with no security by default? That
seems like a really bad idea. I get you’re trying to be clever but damn. Can’t
possibly see that causing a problem.

~~~
rlonn
What is the threat scenario you see?

~~~
gitgud
Spamming the server with millions of unique but unverified email addresses, it
will store anything you throw at it...

~~~
rlonn
I wrote about that just now: there are rate limits in place to prevent it.
You'd have to go to quite a bit of trouble to be able to use the system for
any kind of effective spamming, coming from different unique client IPs and
only send a few emails from each of them, and not send too many emails to the
same domains (e.g. spamming gmail.com would be impossible).

------
vaidhy
Maybe I am missing something..

I am curious how this is different from multiple other hosted graphite/grafana
solutions. All of them allow for time-series to be stored and
retrieved/graphed.

~~~
rlonn
pushdata.io is simpler. You don't need to set up an account, the API is very,
very small and simple, and the visualization is also much simpler than using
Grafana. It is meant to be as frictionless as possible to get started with.

------
jphelps
This will be immediately useful for all of my little projects. Looking forward
to metrics-without-headache. Thanks!

------
nissehulth
I liked the part about replacing the React code. Small is beautiful.

~~~
rlonn
Sometimes I worry that I'm too old-school, doing things like throwing out
frameworks everyone else is using, and I see young(er) developers being
incredibly productive when using those frameworks plus components, libraries,
etc but then I also see that a lot of what they create looks the same and
works the same, contains the same bugs, because (IMO) they're sometimes over-
using external code and using cannons to shoot flies. I'm probably a bit too
much the other way around, but it seems to work for me. I tend to be quite
focused on minimizing external dependencies and only use what is absolutely
necessary.

~~~
tnr23
I absolutely agree! Built a 6-figure-MRR SaaS with having exactly this
attitude.

~~~
rlonn
My attitude, or the other one? :)

~~~
tnr23
yours :)

------
rajadigopula
Nice idea. Easy to plugin and use with IOT devices.

------
santypk4
Really cool!! I'm looking forward to use this on my next project

------
jajo
Looks marvellous!

------
MungoBBQ
I like that I get a finished cURL example right away - so easy to start using.
:)

------
alex_112
Seems like a fantastic service!

