
Show HN: FSQL – Search through your file system with SQL-esque queries - kshvmdn
https://github.com/kshvmdn/fsql
======
crivabene
Does anyone remember WinFS (1)?

Bill Gates described it as his biggest product regret (2).

I remember I thought it was brilliant. Too bad it was probably a little bit
too futuristic for its time, as for a few other things they launched when it
just was not the right time... the clunky Tablet PCs (3) were for sure another
example.

(1)
[https://en.m.wikipedia.org/wiki/WinFS](https://en.m.wikipedia.org/wiki/WinFS)

(2) [http://www.zdnet.com/article/bill-gates-biggest-microsoft-
pr...](http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-
regret-winfs/)

(3)
[https://en.m.wikipedia.org/wiki/Microsoft_Tablet_PC](https://en.m.wikipedia.org/wiki/Microsoft_Tablet_PC)

~~~
UnfairIsaac
That was my first thought, too. I'd love a file system that was more like a
relational database.

Also see the Pick operating system.

~~~
LgWoodenBadger
Transactional ACID updates to file systems would be pretty fantastic.

~~~
mpfundstein
yes. thinking about this for a long time already

------
misterdata
For those of us on Windows: Everything [1] does the job quite nicely with much
less verbose syntax.

[1] [http://voidtools.com](http://voidtools.com)

~~~
degenerate
I've been using this for years and it is LIGHTNING fast.

No need to "index" all the files because it reads directly from the MFT. If
you create a new file matching the search pattern it's already sitting in the
results window by the time you alt-tab.

Also, the "directory size" equivalent of Everything is WizTree [1] ... much
faster than WinDirStat, which I see recommended way too often.

[1] [http://antibody-
software.com/web/software/software/wiztree-f...](http://antibody-
software.com/web/software/software/wiztree-finds-the-files-and-folders-using-
the-most-disk-space-on-your-hard-drive/)

~~~
kevindqc
Are you sure it doesn't index anything? There is even a section in the
settings called Indexes.

Also when you launch it first, it's going to be empty and says it's scanning
your folders and it takes a little bit until you see something.

I think it is still indexing (maybe using the MFT instead of recursively
listing files and directories), it's just a lot better than Windows search
indexing. And it might use this [1] to keep up to date? It's mentioned in the
settings

[1]
[https://en.wikipedia.org/wiki/USN_Journal](https://en.wikipedia.org/wiki/USN_Journal)

~~~
degenerate
No I'm not sure. Maybe it builds a rudimentary index... but give it a shot
yourself and see. 15 seconds after installing, you can search your entire
system instantly. It's crazy whatever it is doing.

And yes I do believe it uses the USN Journal to stay up to date.

~~~
kevindqc
There's actually a wikipedia page, and it explains how it works. As we
suspected, it uses the MFT for initial indexing and change journal for
updates.

[https://en.wikipedia.org/wiki/Everything_(software)](https://en.wikipedia.org/wiki/Everything_\(software\))

------
rekwah
Reminds me of osquery [0].

[0] -
[https://github.com/facebook/osquery](https://github.com/facebook/osquery)

~~~
MrBuddyCasino
Doesn't Windows have something like this built-in? WMI or something?

~~~
nunez
yeah; wmi is incredibly powerful. before osquery linux and os x had nothing
like it. it even has performance counters (albeit at slower intervals than win
etl) at the ready.

------
bangonkeyboard
On macOS, there is a query syntax [0] that's usable in Spotlight and the
mdfind(1) command. Richer searchable attributes [1], but the results may have
to be piped through other tools for formatting or other output.

[0]:
[https://developer.apple.com/library/content/documentation/Ca...](https://developer.apple.com/library/content/documentation/Carbon/Conceptual/SpotlightQuery/Concepts/QueryFormat.html)

[1]:
[https://developer.apple.com/library/content/documentation/Co...](https://developer.apple.com/library/content/documentation/CoreServices/Reference/MetadataAttributesRef/Reference/CommonAttrs.html)

------
m0skit0
I think SQL is too verbose for use on the terminal. find + grep does the trick
with way less verbose syntax (but also probably less readable). With that
said, it is quite cool.

------
CaseFlatline
Don't see it offhand so asking:

1) How/where are you storing the index 2) Have you tried this on large (30+ TB
filesystems)?

~~~
agumonkey
even without an index, having a way to project declaratively instead of
relying on cut/sed is giving me hot flashes.

------
atemerev
I wanted to write the exact opposite: a Mysql/Postgres client as a FUSE
filesystem driver. Namespaces -> folders, tables -> (editable) CSV files,
stored procedures and settings accessible as (editable) plain text files.

~~~
bbcbasic
Sounds dangerous!

~~~
atemerev
No more than DELETE FROM ... WHERE ... wait, where is WHERE?

~~~
philsnow
Even this can be made safe(r) if you only only only connect to your database
nthrough a proxy that sanitizes queries. IIRC vitess adds an implicit LIMIT 10
to queries that don't have a limit.

------
m00s3
Seems if the query is always going to start with SELECT, that maybe it should
be assumed? I would never use this though, ack or find seem sufficient to me.

~~~
dTal
It's a shame Bash used 'select' as an elaborate menu built-in - it'd be quite
neat to name the binary that (and drop the quotes). The you could just type
the query right into your prompt!

~~~
labster
Just use the fish shell instead, and you can avoid the years of shell cruft of
bash, or the endless customization of zsh. Bash is a good environment for
shell scripting, but not really the best for user interaction. Although perl
is probably the best environment for shell scripting.

~~~
fnj
Xonsh is far better both interactively and for scripting than bash, fish, zsh,
or perl.

------
aardvark291
Didn't BeOS have some awesome database-like file system indexing and query
system?

~~~
Koshkin
Sure; in general, a file system endowed with extended/extensible attributes
can be naturally seen as a relational database (in which the files themselves
are BLOBs).

~~~
nayuki
This video talks in detail about the extended attributes in BeOS:

[https://systemswe.love/archive/minneapolis-2017/ivan-
richwal...](https://systemswe.love/archive/minneapolis-2017/ivan-richwalski)
\- "Metadata Indexes & Queries in the BeOS Filesystem"

[https://player.vimeo.com/video/209021697](https://player.vimeo.com/video/209021697)

------
jklehm
Reminds me of WSSQL [0] on Windows.

[0] [https://msdn.microsoft.com/en-
us/library/windows/desktop/bb2...](https://msdn.microsoft.com/en-
us/library/windows/desktop/bb231255\(v=vs.85\).aspx)

------
ziikutv
find/grep/awk/ag get me a long way to be honest. However, I think this is a
cool project because it makes filtering of file attributes (such as size) so
much easier. No need for splitting strings and using regex. Cheers.

~~~
anuragbiyani
Not to take anything away from this project, but you can filter on size,
permission, etc easily and robustly using just `find` (try -perm, -size,
-{c,m}time, etc flags):
[https://linux.die.net/man/1/find](https://linux.die.net/man/1/find)

If you are splitting strings (from output of `ls -l` presumably) for such
tasks, then definitely take a look at find.

------
devnonymous
Nice project, wish you the best ! Although tbh, I personally won't use this
simply because I know enough of find(1) to not see the cognitive overhead of
switching to sql to do filesystem /queries/.

Any examples where this would be better than using find (with the occasional
filter thrown in) ?

~~~
_jal
Subselects would be a pretty awesome feature.

"select name from foo where name not in (select name from ../bar where date <
...)"

I'm usually fine with `find`, but when doing things more interesting than just
"find files in this directory that are not in that directory", while uncommon,
tend to make me think about my pipeline a bit.

~~~
taeric
Out of curiosity, I'm interested in how folks would do the "in this not that"
folder query. At a gut shot, I'd assume that diff would be used. I'm about to
dig through the find man page to see if it has something directly to help.

~~~
hoytech
Assuming no dups in file1, this outputs lines in file1 that aren't in file2:

    
    
        sort file1 file2 file2 | uniq -u
    

(double file2 is not a typo :)

~~~
isometry
Or (a little faster):

    
    
      comm -23 <(sort file1) <(sort file2)

------
vondur
I think the BeOS had a file system that was set up like a database that could
be queried.

~~~
danans
It sure did. Alas at the time I had a BeBox in college (mid-late 90s) I didn't
know SQL yet ;) I think it just searched over file metadata, not contents,
though I might be mistaken.

------
samsk
We have implemented smth. like this with sqlite extension, pretty powerful,
with all the goodies sqlite (and its extensions) provides...

~~~
kshvmdn
Sounds interesting, would love to take a look if it's available anywhere.

~~~
samsk
Unfortunately, it was commercial development, so it's not released. But
implementation is relatively easy - it was a sqlite virtual table that (as
much as I remember) looked in where condition for dir field, and listed that
directory (= returned stat() data). Whole thing was quite interesting, because
almost every component was somehow hooked into sqlite (either vith function or
virtual table), so one could do pretty interesting things only with SQL.

------
amasad
Did someone come up with a generalized rule about putting SQL on top of every
possible system that contains queryable information? Here is first-pass:

>eventually every system that contains information that can be queried will
have a sql interface

~~~
Koshkin
SQL is the standard language for querying relational data, so why not.

~~~
Retra
It's not a great standard. Practically keywords the whole English language...

~~~
devnonymous
Considering its longevity as compared with other 'standards' that came around
40 years ago, I'd say it is in fact, a great standard.

------
bpchaps
Nice! I'm actually working on a similar project to push lsof and files from
/proc into some postgres tables. Lets me do cool things like query log files
across a ~6000 server infrastructure similar to:

    
    
      SELECT distinct(l.name) 
        FROM lsof l, lsofer_runs r 
      WHERE l.lsofer_id = r.id 
        AND fd_type = 'REG' 
        AND l.fd ~ '[0-9][uw]' 
        AND l.name like '%log' 
      GROUP BY l.name, r.hostname 
      ORDER BY name
    

Best of luck!

~~~
matthewaveryusa
so you're rewriting osquery? [https://osquery.io/](https://osquery.io/)

~~~
tyingq
His description sounded like it would do joins across different hosts. Osquery
looks to be single host at a time only.

~~~
bpchaps
Yep.

I'm specifically writing it to find any log file that isn't being pushed into
our third party logging service. It's a surprisingly difficult problem,
especially considering the amount of tech sprawl that's accumulated. Since
it's also a relatively low latency environment, it has to be written in a way
that doesn't add too much load (without core isolation..).

~~~
tyingq
You could use the audit subsystem. [https://www.linux.com/learn/customized-
file-monitoring-audit...](https://www.linux.com/learn/customized-file-
monitoring-auditd)

~~~
bpchaps
Definitely crossed my mind, but I'm working on hosts where installing auditd
isn't really easy. Broken yum and apt all over the place makes installing new
packages almost impossible. Same goes for lsof, but its installed in "enough"
places. Kinda nightmarish, but it gives me a chance to write some fun code ;).

Also, thanks for the article! Super interesting. Think that'd be better than
implementing something on top of sysdig?

~~~
tyingq
Another non polling option:
[http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-
li...](http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html)

~~~
bpchaps
+1 for anything by Brendan Gregg.

I wasn't aware of the polling limitations of sysdig, but it definitely
explains some things I've seen in the past. This is definitely going in my
toolkit. Cheers!

Edit: dammit, spelled his name wrong.

------
Jaruzel
This is nice, but what I'm actually looking for is a lightweight clone of
SharePoint Search[1].

Something that has a self-hosted Web Interface, and an engine that I can point
at some file servers, and let it index the files to it's hearts content. All I
then have to do is search the index 'google style' for my files.

Any suggestions?

\--

[1]
[https://i-technet.sec.s-msft.com/dynimg/IC423463.jpg](https://i-technet.sec.s-msft.com/dynimg/IC423463.jpg)

------
zitterbewegung
This is really neat. MacOS has a easy to use smart folder which I use to find
recent files and large files. An interface like this is an advantage because
it's easy to understand what it's doing and it's cross platform . Other people
make the claim it may be verbose (but being verbose makes the operation
clearer) and SQL is so familiar to programmers that are power users.

------
jstimpfle
_FROM dir1, dir2_ doesn't mean the same as in SQL. In SQL that's a join of
dir1 and dir2, but here it's a union.

~~~
kshvmdn
Hence the _esque_. :)

Good point though, I'll make a note of that in the README.

------
rodorgas
I never know when I have to use find vs grep. And linux grep is different from
macOS grep so I google about it every day lol. I just never figure it out. I
think I'll be a heavy user of FSQL.

------
legulere
Reminds me a bit of SPARQL, that is used on the linux desktop e.g. by Gnome
Music to find your music collection through Gnome Tracker

------
gigatexal
Kudos to the authors for taking an idea I've had for a while and actually
doing it. Very, very cool.

------
drinchev
Really cool idea, but I'm really missing the WHY section in the Readme file.

Thinking about a use case is quite hard. Anyone?

------
rs86
Really cool idea

------
adamhepner
This is exactly why I love PowerShell.

