
Show HN: Musoq – Use SQL on various data sources - Puchaczov
https://github.com/Puchaczov/Musoq
======
asah
Tangentially related: check out the (awesome) Postgres foreign data wrappers
(FDWs) and the multicorn scripting library to write them. There's 100s of
connectors listed on the wiki.

[https://www.google.com/search?q=postgres+fdw](https://www.google.com/search?q=postgres+fdw)

~~~
Puchaczov
I wasn't aware of that, thank you!

I was targeting to be succesor of LogParser more :)

------
dmoreno
Similar, but as a library, I did ExoSQL, for elixir [1], and heavily inspired
by FDW from poatgres. I think there is great potencial on mixing data from
many sources!

I think exosql has more features (lateral joins, subqueries, JSON support...),
but I'm super happy to see more players in this field.

[1]
[https://GitHub.com/Serverboards/exosql](https://GitHub.com/Serverboards/exosql)

------
lixtra
If csv is your main use case, also have a look at qtextasdata[0].

[0] [http://harelba.github.io/q/](http://harelba.github.io/q/)

~~~
banku_brougham
i use this regularly, though importing to a local postges instance is a better
way. That one extra step activates my laziness.

------
barrkel
Have you looked at Apache Calcite?

Quite similar in very high level concept.

~~~
Puchaczov
Haven't looked at that, Looks very promising! I was creating Musoq to get rid
of scripting such things. Can't find is it possible to parametrize easily the
table you are connect into

------
duncan-donuts
What about including data from an actual database? Having a union of data
from, say, a rest api and data I have in my database would be really neat.

~~~
Puchaczov
Easily accessible thing you can do is to write a plugin that takes data from
database or databases and combine them with the second plugin that gets data
from your rest api

~~~
duncan-donuts
Is that plugin system documented? I thought I read that it was on the roadmap
but not an existing feature.

~~~
Puchaczov
look here:
[https://github.com/Puchaczov/Musoq/wiki/Plugins](https://github.com/Puchaczov/Musoq/wiki/Plugins)

You can also look how implemented are default plugins, especially flat file
one, it's pretty easy

[https://github.com/Puchaczov/Musoq/tree/master/Musoq.Schema....](https://github.com/Puchaczov/Musoq/tree/master/Musoq.Schema.FlatFile)

------
iblaine
Pretty cool project. If you're someone dealing with disparate data types, this
may be useful. That is a rare need, but interesting none the less.

Typically you create a connector once then walk away from it. Maybe this tool
solves the problem of having to create that connector.

------
pingec
This is so cool. What would be some real world use cases where this would come
in handy?

~~~
Puchaczov
This post will be a bit long, sorry for that :)

Mostly, I'm using it for ad-hoc querying as it allows for semi-automatic
columns detection. For example, my bank account file I am querying shows how
I'm spending my moneys. It looks like below and I have few more advanced
queries doing some different calculations on my CSV file.

select ExtractFromDate(DataOperacji, 'month') as 'Month',
ExtractFromDate(DataOperacji, 'year') as 'Year', SumIncome(ToDecimal(Kwota,
'pl-PL')) as 'Income', SumOutcome(ToDecimal(Kwota, 'pl-PL')) as 'Outcome',
SumIncome(ToDecimal(Kwota, 'pl-PL')) - Abs(SumOutcome(ToDecimal(Kwota, 'pl-
PL'))) as 'Total' from #csv.file('C:\some\path\my_account.csv', ';', true, 37)
group by ExtractFromDate(DataOperacji, 'month'), ExtractFromDate(DataOperacji,
'year')

from another point of usage, I'm sometimes using it to find files in folders
as I can both query file properties and their content. Also, I was testing it
with querying git repository (I was using
[https://github.com/libgit2/libgit2sharp](https://github.com/libgit2/libgit2sharp)),
If you are curious about some statistics, then probably it would be usable for
you.

Another thing I would imagine is that when you have such a huge compressed
file that you can't unpack, then you could query such a file and unpack only
what you needs. It doesn't have to be a packed file only. It can be such a big
log file that most of editors wouldn't be able to open.

Another thing, If you are an administrator and have to move buch of files or
create yourself a view how the files looks in more aggregated way, then it
will be probably good for you (I'm not an administrator and never was so
please take into consideration that my point of view may be clearly wrong)

Different thing, If you've got a bunch of IOT devices, you would end up
writing plugin that gets data from such devices and provides you rich
statistics about various things.

Another possibility, you need event (for scheduling purposes) that fits some
sophisticated time pattern, you can't use CRON becouse it's syntax has some
serious limitations, probably it will be better to query time for such a
pattern and store it somewhere. Otherwise you would need writing a lot of
single CRON expressions that all fits your requirements.

I will go right now into some crazy imaginations like:

You would end up writing a plugin that extract some statistics from photos
(like is the image contains people, how many of them are, what are the color
of sky, does it has sun?) and then query some huge photo sets about how many
of images contains people and also have a sky as a background). It should be
really fast as your plugin can use some C++ library to fastly process images.
Because you are fully controlling with what and how you feeds evaluator, you
can use parallelism to process multiple files at once :)

