
Ask HN: Is there server-side software that we are missing in 2018? - borplk
It seems like there are so many choices in each category that there&#x27;s nothing left to do.<p>I mean things like RDBMS, NoSQL databases, time-series databases, key-value stores, message queue, web servers.<p>I remember many years ago if you were building something you&#x27;d notice the missing solutions and tools because there were things you couldn&#x27;t do easily (like lightweight application-level caching before redis&#x2F;memcached popularity).<p>Nowadays it seems like there&#x27;s nothing missing.
======
gfodor
I don't think anyone has solved the issues raised in Out of the Tar Pit in the
data storage space. We still spend a ton of effort, perhaps more than ever
before, wrangling incidental complexity. In 2018, I should be able to define a
simple relational schema in 30 seconds and start using it with effectively
zero constraints around data access patterns, transaction volume, and data
scale, and with no design decisions needing to be made around these bits of
incidental complexity. There should be one gigantic knob I can turn to spend
more money to increase more resource consumption of the system, and it should
be damned cheap.

We are so far from that reality that it seems obvious there is still a ton of
work to do. You can see bits and pieces of solutions but if your product asks
me to define indexes, tune queries, determine sharding keys, write against
computer-centric (vs human-centric) APIs, cannot deal with change easily,
requires babysitting, or falls over after hitting some incidental tipping
point relative to the resources available to a single machine, you've not hit
it.

I have not used Google Spanner but at 30k/ft it seems like the closest thing
out there to the idealized case -- but being closed source and centralized it
is not really a "solved problem" imho.

~~~
TheHam
AWS just release AppSync. You can define a relational GraphQL Schema and hook
up to a number of sources to feed data to it. Then, AppSync will automatically
turn your schema into a GraphQL API where clients can make queries, mutations,
and subscribe to data changes with no constraints regarding data access
patterns. Subscriptions scales without any work from you. All of these
operations use the relational schema, which is self-documenting. You can
change the schema any time you want.

Using the graphQL API requires no knowledge of the backend data store.
Therefore, clients and consumers of your API do not need to know about shards,
indices, SQL queries, etc.

If you have an existing DynamoDB table, you can automatically generate a
GraphQL schema/API from it too.

[https://aws.amazon.com/appsync/](https://aws.amazon.com/appsync/)

disclaimer: i work on aws appsync

~~~
davidjnelson
This looks awesome, except that it's still in preview. Seems a bit early to
bet on it - any idea when there will be assurances that it will continue to be
supported long term?

------
chubot
I want a distributed (not cloud) file system that handles many files and WAN
latency (i.e. not HDFS). It might be a cross between Git and BitTorrent.

It feels like deb repos, PyPI, NPM, CPAN, CRAN, etc. should be put in there,
with the addition of binaries for popular architectures. And probably Docker-
like images, although I think if they are not opaque blobs, it would be better
for rsync-like differential compression (which Git implicitly provides).

There will be some small files and some big files. I want it to be like Git so
I can clone locally, not just go to the cloud. The way that Git is trivial to
set up and clone through SSH is nice too.

It probably has to have a notion of "user", like a local file system. (This
makes the problem a lot harder; git doesn't really have permissions.)

BTW Julia's package manager just used git, but I watched a talk that said this
ended up being a really bad idea, especially on Windows.

As far as I understand IPFS has some of these properties. Has anybody used it?
Could I use it for the package repository use case?

I don't know much about it, but Project Atomic sounds similar too:
[https://www.projectatomic.io/](https://www.projectatomic.io/)

Any other projects that seem like a close fit?

BTW I think this would also be useful in the data center, as some companies
like Twitter apparently use BitTorrent in data centers to start large jobs
quickly (i.e replicate the same 500 MB binary to 1000 machines).

~~~
burkemw3
A few random thoughts:

IPFS sounds a lot like what you want. Tahoe LAFS plays in this space a bit

The newer distributed file systems I've seen don't like permissions. They like
cryptographically-backed capabilities. You have the permission to read the
file because you have the ability to decrypt, through the key. (Of course, key
management is easy </sarc>).

Some user-facing distributed filesystem talks drift into FUSE (or similar)
territory. A TahoeLAFS dev talks about how users probably don't actually know
what they want:
[https://plus.google.com/108313527900507320366/posts/ZrgdgLhV...](https://plus.google.com/108313527900507320366/posts/ZrgdgLhV3NG)
(QUIBBLES: REAL FILESYSTEM VS. STORAGE APP section). This is probably less
relevant for a package manager.

The first time I read about BitTorrent based deployment was from Facebook.

~~~
chubot
Thanks for the link! I've heard about Tahoe LAFS, but like a lot of these
projects (Upspin, IPFS), I don't actually know anyone who uses them!

------
freehunter
I'm a big believer in logging, and I find it hard to believe that collecting
and monitoring logs needs to have a server with 32GB of RAM and massive Java
and ElasticSearch back-ends like ELK, Splunk, Graylog, or similar monitoring
software.

I've been searching for something self-hosted that can run in my server's
spare capacity and monitor the OS and application logs with a simple searching
interface (just a web interface to grep would be handy) and come up short. If
I can't host it on the cheapest DO plan, it's out of my budget. I really don't
want to have to build it myself but I'm just about at that point.

~~~
Artemis2
oklog might be for you?
[https://github.com/oklog/oklog](https://github.com/oklog/oklog)

It’s still new, but the architecture is sound and it worked well for me on a
small single-node deployment (about 30 clients sending logs).

~~~
Blindedwino
It looks promising. Now if it could integrate easily with syslog-ng/rsyslog,
that would be perfect.

------
m_ke
Machine learning model management and serving service.

I haven't seen a decent open source framework that makes it easy to package a
trained model with prepocessing and postprocessing steps and deploy it behind
an API.

Adding performance tracking and model validation on top of that would be
great.

Then there are things like queuing/batching and autoscaling.

Closes thing that comes to mind right now is tensorflow serving and their k8s
stuff (which it looks like they just renamed to tf-operator and moved to
kubeflow org [https://github.com/kubeflow/tf-
operator](https://github.com/kubeflow/tf-operator))

~~~
mpeter88
The model development pipeline is still a far cry from the maturity of the
software development pipeline, and will have to get there in order to reduce
the hands-on heavy lifting required for model development and deployment.

Along with packaging a trained model are things like: Snapshot/versioning of
the training and test data used to create the model, versioning of the model,
storing versioned models in a model registry, auto-deploying models from the
registry to target environments, telemetry from deployed models.

Closest I've found is
[https://github.com/mitdbg/modeldb](https://github.com/mitdbg/modeldb), and
I've spoken to the woman leading the effort. They still have data versioning
as an open question, and don't see the need. But there are training set
modification, results RCA, and other use cases that drive the need to catalog
training/test data with the model that results.

It'll get there. Just a question of when and how.

------
simonw
I'm still waiting for a rock-solid scalable open source graph database in the
mold of the Freebase database engine or the amazing graph database that
Facebook have built for themselves. I'm very excited about dgraph as an option
here but I think it's still an area that is very open for new entrants.

~~~
rambojazz
This would be really useful. Now there are Fuseki and Janus[1]. Years ago
there was 4store [2]. There is also gStore in development [3]. They all
require a lot of help. Would be nice if somebody could pick up and help one of
these.

[1] [https://jena.apache.org/index.html](https://jena.apache.org/index.html)
[http://janusgraph.org](http://janusgraph.org) [2]
[https://github.com/4store/4store](https://github.com/4store/4store) [3]
[https://github.com/Caesar11/gStore](https://github.com/Caesar11/gStore)

~~~
kawera
Adding Cayley to the mix:
[https://github.com/cayleygraph/cayley](https://github.com/cayleygraph/cayley)

------
simonw
An open-source platform for self-hosting function-as-a-service - something
that provides the tooling for easily saying "deploy this function, auto scale
it, route HTTP traffic to it, now atomically replace it with this new version"
without having to lock yourself in to Google/AWS/Azure.

~~~
andrenotgiant
OpenWhisk is a bit clunky, but it does exist!
[https://openwhisk.apache.org/](https://openwhisk.apache.org/)

~~~
simonw
That looks like exactly what I'm talking about:
[https://github.com/apache/incubator-
openwhisk/blob/master/do...](https://github.com/apache/incubator-
openwhisk/blob/master/docs/actions.md)

The space is still early enough that there is a lot of value in competing
options

------
jastr
Some of the biggest advancements in recent years have come less from
technological breakthroughs, and more from improvements in designs and
abstractions. Tools like Ruby on Rails, GraphQL, and even AWS, while
impressive technically, are breakthroughs because they improved developer
efficiency. They also weren't "needed" until they were built, and have allowed
many developers to work on broader parts of the stack.

Also, today's tech solves today's problems.

------
mrfusion
Even though we have Postgres and MySQL they really don’t seem well configured
out of the box and you’ve got to wade through a bunch of settings files and
understand what vacuuming is.

I think we need a self tuning rdms. It could watch and adapt to usage patterns
and available resources.

~~~
davidjnelson
Isn't that what aws aurora and google cloud spanner offer?

------
simonw
There might be something exciting to build to help implement ludicrously fast
Google-style autocomplete / typeahead Search. I've tried using MySQL,
elasticsearch, PostgreSQL with trigram indices... they can be made to work,
but I've never felt that I'm anywhere near the quality of whatever it is
Google are doing here.

~~~
lowry
You need to go deeper and use AnalyzingSuggester from Lucene. It is as fast as
it gets. Also, do not forget about the web part of the equation. Using HTTP/2
helps, as well as disabling buffering all the way through.

------
Kagerjay
I would say anything in the video / 3D / imaging service platforms always have
a lot of things left to be desired for.

Anything that potentially touches FFMPEG basically

I can name a few examples that I still think need improvements

\- Online gif editors

\- Video editing / clipping

\- Background image editing / online photoshop equivalents

\- Better alternatives than lucidpress / adobeIndesign for catalog page /
brochure creation

\- Machine learning / deepfake online tool, this is all driven client side
mostly

\- Pretty much anything client side thats not yet server-side is open game I
would say

\- PDF markup tools could be better for online-based services, especially for
architectural design

\- CAD-based online programs for retails so customers can DIY build their own
warehouse or layout schemas is lacking

\- Better online RDBMS. Currently, there's just airtable, its lacking some
core features like refential integrity

\- Integrating space-repetition learning in most educational based services
(lynda.com,pluralsight, etc)

\- Managed ecommerce cart / hosting services. Its a well understood problem
that should technically be easy for a client to do.

Again, I would say, almost every profitable service touches some form FFMPEG
for video / image editing / 3D is definitely still out there. There's such a
huge untapped market out there combining all of theses services in one
package.

~~~
imhoguy
I would add on-the-fly video transcoding. FFMPEG would need some efficient
context state serialization to bring resumability/seekability and at the same
time to stay low on resources.

------
takinola
I should be able to copy and paste a server configuration and create an
identical copy of my server. Right now, the only solutions I see are
generating images (too big and cumbersome) or writing scripts (too complex).

I would like to be able to type in a single command and replicate all the
packages, services and configuration present in a particular server to a new
target system.

~~~
noir_lord
You can get a fair way down that path with ansible _if_ you are very careful
but it still requires too much work, modern operating systems and packaging
where not built for idempotent rollbacks.

We try to treat pets like cattle then wonder why they bite us (to reverse the
usual refrain).

I agree though, it _should_ be declarative but everything shits everything
else all over the filesystem.

------
jedberg
A workload-aware data distribution proxy.

Let me explain. A lot of people talk about multi-cloud these days. Either AWS
and Azure, or AWS and their own datacenter, or whatever.

While it's really easy to send compute jobs to one cloud or the other based on
which one is best suited for the job, the big issue is that your compute job
will need data. Right now your choices are to have a copy of all your data in
all the clouds (a very expensive proposition since you'd be constantly
shipping updates across very expensive outbound connections) or to have all
your data in one place and have the compute jobs reach out across the internet
to get it (also expensive and now there is latency for every job too).

I want a proxy that is smart enough to say "Workload X is usually done on AWS,
and requires data points Y and Z, so make sure Y and Z are always up to date
on AWS, but lazy update Y and Z to GCE in batches through the cheapest direct
connect possible".

~~~
matteuan
One project that is aiming in this direction: [http://seaclouds-
project.eu/](http://seaclouds-project.eu/)

~~~
jedberg
That's an interesting start! Scanning briefly it looks like I still have to
tell it what data and workloads go where -- it isn't figuring it out
automatically. Which is the holy grail I'm looking for.

------
tmaly
Given the sheer number of services on AWS, I would like an expert system that
would query me for requirements and then suggest a set of possible services to
use together to solve my problem.

~~~
wenc
Meanwhile you can use this to get a quick overview of the services offered.
Amazon should put this up on their AWS main page.

AWS in Plain English [https://www.expeditedssl.com/aws-in-plain-
english](https://www.expeditedssl.com/aws-in-plain-english)

------
simonw
Serving machine learning models in production is still something that appears
not to have an obvious correct solution.

~~~
agibsonccc
Hi fellow YC alumni! See the pitch here:
[https://news.ycombinator.com/item?id=16399326](https://news.ycombinator.com/item?id=16399326)

We'll be supporting PMML and the like as well. The goal is to hit the simple
things rather than perpetuating the latest hype like the AutoML stuff people
are going on about currently. If you'd be interested, would be happy to have a
conversation to go over what we're trying to do. We hope to just provide a
platform neutral tool for building and deploying models similar to sagemaker
(but cloud agnostic)

------
ioddly
I think what I've been missing, and trying to do a better job at, is
understanding the awesome power of existing and mature solutions. Things like
postgres's LISTEN/NOTIFY and so on.

------
trjordan
10 years ago, I learned about the idea of elastic infrastructure. Services
with load balancers in front of them, hosts that come and go easily, a
structured way to communicate between services.

At the time, that was nginx + manual autoscaling, then it was ELBs and
autoscaling groups, now it's kubernetes and containers, maybe hosted. It's
still not there.

I'm excited about the software that makes that operable at scale. It seems
like a service mesh is a good idea. It seems like mutual security between
services is a good idea. It seems like storing routing configuration in a
separate control plane that is executed in a data plane like Envoy is a good
idea.

There's a bag of software at CNCF that's loosely organized around this, but I
don't think the "just deploy some code, it can scale and you can have tons of
services doing that with good visibility and operability" is quite there. I'm
really exicted about Envoy, but I don't think there's a good control plane for
it. I'm part of a company that's working on a commercial implementation
(turbinelabs.io), and Istio is in a similar space.

There's still work to be done!

~~~
jacques_chester
I suspect this niche will emerge this year.

------
dozzie
> It seems like there are so many choices in each category that there's
> nothing left to do.

Log storage and search for _structured_ logs (e.g. JSON or CEE, not merely
stringblobs). We have paid solutions (Splunk, Loggly, Papertrail), and then we
have Elasticsearch, which gets worse in this use scenario with every release.

Message stream processing engine that doesn't require restarting to add a
query or data sink, so you could build monitoring system around it. In fact, a
monitoring system designed to allow you to easily add your custom processing
or data sink.

Infrastructure inventory that can be both filled by hand and kept updated by
machine and that can be queried from script or browsed by human. For that it
would be useful to have a good topic maps engine, which is another missing
thing.

OS updates manager that can handle more than just Red Hat/CentOS (Red Hat
Satellite or Spacewalk) or just Ubuntu (Canonical Landscape), and while at it,
one that doesn't try to be underdeveloped configuration management tool (like
CFEngine or Puppet) and underdeveloped deployment tool (like Ansible), but can
cooperate with them.

And there's much more where these came from.

------
ex3ndr
I really want to simplify writing backend logic and implement everything like
functions that reacts on some events and produce another one's. This is a very
very looks like, Actor Model (Akka) plus Redux's Reduces: We have just a bunch
of a state changing rules and they can be executed reliably and with decent
performance.

Something very simple (for example for Build Server):

Build Failed (Event) -> Assign responsible user for a crash -> Send
Notification -> Deliver Notification via user's configured notification
systems (email, push, sms..)

This is a Event Sourcing, but event sourcing works well only for one part of
the platform - write side, but reader side is sometimes too hard to implement.
Sometimes there are a problems with eventual consistency and you actually need
to wait while some of the reducers in chain will process this event and
starting from this point everything became toooo slow to develop and you
basically start to redevelop the wheel - this is just a database engine de-
factor. Meh.

~~~
jacques_chester
I'm a one-eyed Concourse fan, which implements a separation of state and logic
that allows designs like this. I've been pitching pretty heavily the concept
of creating or converging to parity with the FaaS I'm assigned to work on.

So: maybe. We might pull this off.

------
joshavant
I'm a front-end developer.

I'd like an OSS solution that will allow me to deploy an arbitrary server-side
service - be it a Ruby, Python application or even a package like OpenVPN - to
some cloud infrastructure, easily.

This solution should spin up a hardened OS distro (CIS-compliant, maybe?),
provision it with my arbitrary services (using Ansible or Chef or something),
and deploy it to AWS or some cloud infrastructure for me (using Terraform or
something).

All these component pieces exist, but nothing ties _all of them_ together for
an easy deploy for a front-end developer like me.

(And, I know things like Heroku and CodeDeploy exist, but I dislike lock-in
and they nearly universally come with their own restrictions, like lack of
support for server-side Swift applications or custom services like OpenVPN or
git-annex.)

EDIT - I'm strongly considering taking some time off to write this soon, so
get in touch if this is something you're interested in! Contributing or using!

~~~
jacques_chester
There's an overlap with what BOSH does. Starts with a stemcell, compiles your
packages against it, spins up whole VMs configured with those packages.

It also adds monitoring for both VMs and processes.

Don't underestimate what you get from a PaaS like Heroku, though. If you're
able to stick to 12-factor apps, the lockin is pretty mild -- you should be
able to hoist your skirts and move to Cloud Foundry or even OpenShift without
too much pain.

Disclosure: I work for Pivotal, we work on BOSH and Cloud Foundry. We compete
with Red Hat and Heroku.

------
mabynogy
A schemaless and automated middleware for scripting languages (js, php,
lua...).

The tool inspects the source code and wraps calls across processes (no IDL).

I wrote a (beginning of) a description here: [http://dpt.slasheva.com/project-
ideas.html#middleware](http://dpt.slasheva.com/project-ideas.html#middleware)

------
slake
A CMS backend to which I can sew on any template based frontend I want without
having to follow the strict workflow set by the backend.

The backend would just serve content when asked (JSON prolly), the frontend
could be anything you desired.

------
LeonidBugaev
Area of automated testing and developer tools, in general, is hugely
underestimated. Tools that help you write, format, verify code, automatically
detect issues from stacktraces, or any other sources like tcpdumps, smart
fuzzers, and etc. The market is huge.

I have a personal project for 5 years in this area
[https://goreplay.org](https://goreplay.org), and investigating ways to
automatically find issues in web applications. Project get traction, but I do
not see any competition so far. And there is a lot of reasons for this because
in such under-researched areas being a pioneer for both project owner and end
user require them a different set of mind. And this is really hard to market.

------
ntolia
I believe the problem has now shifted to a slightly higher level and
especially when working in a microservice/containerized world. For example,
given the number of specialized services you mentioned, we see more
applications with polyglot persistence stacks underneath. If they form a part
of the same application, how do you collectively manage them? What does it
mean to take a consistent snapshot? These and a bunch of other similar
questions need to be answered.

There is some work we are doing ([https://kanister.io](https://kanister.io))
to help with these issues but there are a lot of emerging solutions in this
space. Look at CNCF's landscape for some more detail.

------
stocktech
I feel like we're in a consolidation phase. We have a ton of tools/software
and the big developments have been in how we use those tools aka devops. I'd
put things like kubernetes/docker in this category, but there's obviously huge
room for growth around this tech.

I do think there's room for new tools tho, but by definition, they're on the
cutting edge and not especially visible if you're not looking. Things like
stream processing are already incredibly useful and only getting better.
Machine learning could be in this list too. It might not be a 1000%
improvement like caching, but that's part of a maturing industry.

------
maratd
> Nowadays it seems like there's nothing missing.

There's lots of things missing. There's just nothing missing in established
categories. Why would there be?

Start your own category, create software for it, then convince everyone that
the category is important.

~~~
thesmallestcat
Great advice! I'd venture that managing large files is an unsolved problem.
It's a hack in most version control systems, and uploading/downloading files
from a host, even S3, is a slow, serial process. Same for checksumming.
Network speeds have more than caught up, and large files are a frequent
process bottleneck. Something that makes it easy to manage and consume large
files could be a big deal. It probably would require a new application
protocol, maybe even a new filesystem similar to XFS.

~~~
imhoguy
Or maybe a file should stay where it is and processing logic itself should be
deployed there. If file parts are distributed then processing could be
suspended and migrated to place where next piece is stored. Something similar
is done with Hadoop and HDFS.

------
maxxxxx
I remember when databases were only for highly paid experts and needed a lot
of customization and configuration. I think we need a similar development for
AI and ML to make them accessible to the average developer.

------
rs86
I wish we had more statically typed web backend frameworks as usable as rails
or phoenix. I have used Elm a lot recently and it rocks. I wish I had
something as incredible for the server side.

~~~
davidjnelson
I do too! Seems like the play framework with java and the sails framework with
typescript are a few existing options. I'd love to see something rails-ish for
go.

~~~
2_listerine_pls
.net core?

------
lowry
I miss a decent self-hosted photo/video library.

------
DannyB2
The missing piece?

It is a new project, it enables your server to use all other front end and
back end technologies at the same time! That is what makes it so fantastic!

You get all front end JavaScript frameworks, and all back end server
technologies, all for one low price. Just install this new component into your
project. It will download the other few gigabytes and hook it into your
project.

------
CryoLogic
1\. Static Website generators for non-blogs 2\. Half decent wrapper libraries
for tools like FFMPEG in NPM 3\. Plug and Play game server with stat tracking
that can be easily used for any game 4\. More versatile compression formats
for media, ability to serve up 360/720/1080 from the same file?

~~~
TheAceOfHearts
1\. It's hard to give constructive help without more information. Why not
write a script that compiles a bunch of pages into html? As the other comment
said, there's also stuff like jekyll. Maybe you could provide some use-cases?

2\. Explain your use-cases? Why not call ffmpeg directly? You'll need to
familiarize yourself with the original library functionality anyway, no?

3\. No opinion. Not particularly interested in games. Might be hard to
generalize, maybe?

4\. Versatile in what sense? What are your complaints with existing media
formats? I think licensing is an issue, but stuff like VP9 and Opus seem to
take care of that matter. I'm not an expert, but I think both MP4 and MKV can
already hold an unlimited number of media files.

------
pmohan6
I don't have great suggestions for you but I think software needs are ever
evolving. As more of the world comes online, there are more data requirements,
newer business requirements, etc. We would need even more scalable systems on
various dimensions.

------
biggodoggo
There's two ways to "fill a gap", either you create a gap that needs filling
or you make an existing solution better. You don't see anything missing
because you aren't looking at things from this perspective.

------
slake
Security probably requires quite a bit of OSS on it. Lot's of proprietary
software. There are a lot of penetration testing devices, not so many securing
software.

------
fooyc
ACID databases that can scale easily on multiple machines still don’t exist.

~~~
btown
[https://cloud.google.com/spanner/](https://cloud.google.com/spanner/) and the
open-source
[https://www.cockroachlabs.com/product/cockroachdb/](https://www.cockroachlabs.com/product/cockroachdb/)
both provide scalable ACID transactions; however, both introduce added latency
compared to traditional databases.

------
thesmallestcat
Static website generators.

~~~
m_ke
This might be a joke but there's plenty of room for a good static wordpress
alternative that focuses on bloggers instead of developers.

~~~
pzk1
Netlify CMS?

------
sphix0r
Before storing data we should have a good do not track / privacy respecting
software for the data we store.

------
deepnotderp
I want a way to use AWS Lambda/GCF but with cluster level network locality.

~~~
boulos
Interesting. Do you mean for functions to call each other (so that it’s sub-
ms) or for some other reason?

~~~
deepnotderp
Yeah, basically low latency versions of FaaS.

------
slake
A plugin to handle all of GDPR requirements?

------
tboyd47
It feels rather like we have too much software.

~~~
dspillett
An embarrassment of riches in the categories that are served does not preclude
the possibility that there is a need that isn't well served at all yet.

------
djswartz
I haven't found a good authorization service (fine grain roles, acls, etc). I
always find myself having build it for every new project/company.

