It looks like this is the benchmarking code, I'll have to go over it for curiosity sometime: https://www.sqlite.org/src/file/test/kvtest.c
Hey, maybe they could put rounded corners on the file deletion window, I bet that would fix it!
I worked at MSFT for eight years as a full-time, Flavorade-guzzling employee, and fifteen years later rarely a day goes by that I don't question how Microsoft manages to get anyone to buy that steaming pile they call an operating system.
Kids get accustomed to Windows and Microsoft Office in schools thanks to Microsoft's lobbying, so when they grow up they keep using Windows and pay for Office.
Everyone uses Windows and Microsoft Office, so most companies and organizations have to pay for licenses for each of their employees.
People know Windows, so they develop on Windows, so they get locked into yet more Microsoft products and libraries and have to run Windows on servers. That's yet more licenses for a Windows and for Visual Studio.
Their business strategy would almost be impressive if it weren't terrifyingly harmful.
I actually really like the rest of what Microsoft does though.
VS Code, TypeScript, creating the Language Server Protocol, the .NET Foundation/.NET in general and C#, the list goes on and on. They're an impactful organization which (depending on what kind of software you build/what you write) probably has a significant impact on your day-to-day experience as a developer.
The benefits of vertical integration can have when you (for lack of a better word) "own" these things can be beneficial for the end-user/developer, ignoring whatever kind of anti-competitive arguments there are.
It's the same argument I hear from Apple users -- Apple stewards their digital experience and gives them continuity end-to-end.
Microsoft -- by nature of owning my OS, IDE, the specification and dev team driving my primary language, a cloud provider, and having their fingers in all of these parts of my digital life -- has the ability to be really awesome for me.
They could also do really sinister things if they wanted and I'd be real fucked I suppose, but that's a(n unlikely) risk I accept.
"But muh games!"
The Windows filesystem has dismal performance. The Windows graphics stack is arguably better than Linux’s. And it’s not just the drivers, so you can’t just blame it on GPU vendors, though poor support from a certain popular GPU vendor definitely doesn’t help.
Some of it is just the little things, like doing mouse movement at very high priority in kernel space  . Other parts are bigger and more impressive, like allowing the graphics driver to crash/upgrade/restart without you also losing all of your open windows.
Couple decades ago for me.
> Yeah, 'cause the incredible fragmentation of Linux GUI toolkits and windowing systems is so fun to deal with.
Sounds like probably "never" for you. You seem hostile toward Linux, so you're probably better off sticking with whatever you use currently.
With Linux, the main difference is whether your desktop is primarily Qt or GTK. Sometimes an app using the other toolkit will look kind of funny, but most of those differences are paved over by any competent distro (eg: Debian, Fedora, Ubuntu, OpenSUSE...).
GUI toolkits are irrelevant to games.
Haha. Maybe because it works? I do miss XP though. It felt like the only OS that can work as well on a PC and a Tank.
Try the command line: "rmdir /s" and it's quick.
>I think for the purpose of this discussion, explorer IS Windows.
The article is about the file system, not the UI.
I'm actually a big fan of Windows on the desktop, and as long as I avoid Explorer, I have no issues at all with file system performance while I'm developing, working from the command line, VS Code and Rider, and working with containers and WSL - all works great. But Explorer is such a central component of Windows that it affects pretty all users. I genuinely don't understand how it can be so rubbish ¯\_(ツ)_/¯
I once had a PDF file on the desktop and for some reason this file was causing explorer.exe to freeze under certain circumstances. It's all a big mistery.
Such is the nature of closed source black box software.
And even its Item Handler system is flawed. Still about .ts, it could stutter when I open context menu on ts files, then freeze forever if I hover over just 1 bit on "Open with" menu.
Also, is there a better way to hunt down the background process than sysinternals? It seems pretty ridiculous that a low level debugging tool is required to do something as modest as reliably move and delete files, but I never heard of a better workaround.
This is how it is on Windows. Linux handles open file deletion a lot better.
I'm struggling to understand why one shouldn't treat the behaviour of explorer.exe as simply part of the Windows system.
Are there third-party explorer.exe alternatives available?
If so, during Windows install, where do I click to _not_ install the MSFT explorer.exe and instead select one of the better alternatives?
You'd set a regkey to your program. There used to be a 3.1 style Program Manager - I miss that one
Although Windows remains my favorite environment I share the annoyance at how MS really polishes some things while failing on basic functionality, when they definitely have the resources and skills to fix that. Every time my wifi or WAN hiccups I wonder why Windows still lacks decent network history/QOS tools out of the box.
For power users, you're much better off using the command line tools
EDIT: Does the CLI override the "open in background" check? Or do you still need Mark Russinovich sysinternals to reliably perform elementary file system administration tasks in Windows?
"Windows's IO stack is extensible, allowing filter drivers to attach to volumes and intercept IO requests before the file system sees them. This is used for numerous things, including virus scanning, compression, encryption, file virtualization, things like OneDrive's files on demand feature, gathering pre-fetching data to speed up app startup, and much more. Even a clean install of Windows will have a number of filters present, particularly on the system volume (so if you have a D: drive or partition, I recommend using that instead, since it likely has fewer filters attached). Filters are involved in many IO operations, most notably creating/opening files."
I think the real issue is that you shouldn't mix the two operating system paradigms and it's far better to just run Linux in a VM and benefit from the near native performance at the cost of a tiny bit of inconvenience. It's not a bad option when you consider the remote IDE capabilities that VScode gives you, which is the one product they're doing 100% right.
I'm also not sure there's two paradigms here, if by that you mean there are workloads more suited to Linux and workloads more suited to Windows. Are there any file system operations that are actually faster on Windows? I'm still inclined to believe, like they say, that "file operations in Windows are more expensive than in Linux," even if large files are less impacted than small files.
I really hope Microsoft can improve this, rather than just throwing it under the rug and saying to use Linux VMs, even if they have to make drastic changes like deprecating filesystem filter drivers completely. I think a lot of us depend on the performance of software that was designed for fast filesystems and ported to Windows (not just Git, either.) I also think a lot of tasks fundamentally use a lot of small files. After all, what is source code?
MSFT don’t listen to or care about their customers findings from experience. That’s why canned our partner status.
In this case they haven’t touched NTFS at all - that’s out right rubbish.
I pointed out the paradigm mismatch already. Run unix loads on unix not NT.
NTFS really does quite well on large files but small Unixy things and source code it is hopeless on. I first learned about this trying to make Subversion fast about a decade ago.
You can tune some of this out with fsutil but it gives marginal gains. It's depressing though when you have a VM on windows that is faster than the native OS filesystem :(
Simply moving from files to a DBMS wasn't an option, since the whole point of this edge system was to allow third parties to FTP/SFTP/FTPS/etc in and out, to put/get files. We would have had to re-implement multiple transfer technologies/protocols to have a file-less database system (which didn't make cost-effective sense).
I've asked this question several times, "Why would an OS need an SSD to work normally and be responsive when it was already fast without SSD some years ago", and I often get responses like I don't know how computers work, that win10 takes advantages of SSD and that it cannot work well on old techs. Even when you have a high amount of RAM and use a HDD, it's still slow.
I'm still looking for a decent filesystem or kernel developer to answer that question with a good enough explanation, because to me, I'm either stupid, or there is negligence or some attempt to increase sales of SSD. Maybe windows now uses SSD as some sort of "secondary RAM", and thus it will never be fast with a HDD.
I mean, linux desktops seem to still be fast enough with mechanical HDD.
As a daily windows user, I'm not surprised if it is Defender. I have some files and folders in exclusion list because Defender was interfering with those files that the software is trying to use.
And Microsoft just doesn’t seem to care :-(.
No clue on what the actual underlying issue is. And it's not like we can easily switch filesystems to compare.
i mean i added c:\ as an exception because i do not need realtime av but i dont think it will actually work like that
"SQLite is much faster than direct writes to disk on Windows when anti-virus protection is turned on. Since anti-virus software is and should be on by default in Windows, that means that SQLite is generally much faster than direct disk writes on Windows. "
One would assume that it gets better with every upgrade, but a factor of 2x worse seems to indicate that they added some kind of logging or something.
Like the logging/access system which Android has started to integrate with Android 9 where every access needs to go through the StorageManager.
I believe it is caused by some per-file overhead that NTFS incurs, though I'm not 100% sure.
I would guess their surprise is because the charts specifically show Win10 as being 100% slower than Win7 in the first chart, and still a good 30% slower in the best case (third chart).
That's not NTFS v other, that's Win7 v Win10, both on NTFS.
I don't care what version of Windows you're on, or what version of NTFS you're on; NTFS is dog slow on small files.
I am not sure that NTFS has versioned between Windows 7 and Windows 10, so let's just assume that it hasn't.
If the limitations of the filesystem do not change between operating systems, then what must change is how the OS deals with the filesystem limitation, and/or attempts to work around them.
Windows 7 does better than Windows 10. That doesn't mean NTFS isn't slow.
A completely pointless remark when versions of NTFS and Windows are correlated. And AFAIK there's been no updated to NTFS itself since the addition of symlinks in NTFS 3.1, released with Windows XP. All "filesystem features" since were added at the level OS, using existing NTFS features.
> If the limitations of the filesystem do not change between operating systems, then what must change is how the OS deals with the filesystem limitation, and/or attempts to work around them.
Yeees? Which would make the issue vary based on the specific version of Windows maybe?
> Windows 7 does better than Windows 10. That doesn't mean NTFS isn't slow.
No, but I never claimed such a thing. I'm just pointing out that your assertion that
> It's […] not any specific version of Windows.
> is bullshit.
No, it's not. NTFS is slow, even when it's running under Linux. NTFS has a very pronounced weakness, here.
Performance changing between Windows 7 and Windows 10 means that ON TOP OF NTFS there are other performance problems in Windows 10. It does not mean that it isn't an NTFS problem in the case of Windows 10 -- it means that it isn't ONLY an NTFS problem in the case of Windows 10. It's an NTFS problem in addition to whatever other things Windows 10 is screwing up, when compared to Windows 7.
The difference between Windows 7 and Windows 10 alone? Yes, that's entirely because of Windows 10 and not because of NTFS. There. I never disagreed with that.
Are there any notable trade-offs or limitations you’ve encountered with this approach?
> SQLite as the backing store for file records, though not the file contents
> I imagine I could use it for file content as well
> I imagine I could use [SQLite] for file content as well
The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0.
There’s also certain tech HN, in aggregate, strongly dislikes, and readily upvotes negative articles - Mongo, anything “modern JS ecosystem”, systemd, etc.
But then if said startup gains traction and the team/codebase/systems grow a lot, it can easily become hard to maintain, and you probably wish your backend was implemented in, say, Go/Postgres over Node/Mongo. Or that your mobile apps were written in Swift and Kotlin over React Native. And I think a lot of the HN crowd works at "startups becoming big businesses", so this is probably a common headache. But it doesn't necessarily mean the tech is BAD, just that it's good for certain things (like early days productivity), but a pain for others (maintainability as the system scales).
SQLite is a bit unique in that it's just a super high quality piece of software, that is arguably the best short AND long term solution for the problem it solves (mostly being an embedded DB). But for software where it's more of a tradeoff around early productivity vs. long term maintainability, I think HN is pretty strongly on the long term maintainability side, and that's more of an opinion/choice than a clearly "correct" answer.
Some people see the main post, do some more research on the topic, find something cool / interesting, and post them here. Then this induces similar reactions, until the interest low enough to make them disappear.
But then again, some recurring topics seem to stick around forever. It seems like there's a new Rust article here almost daily.
In the case of SQLite, it's probably the first.
Articles also come in batches (of a topic). Somebody sends something about X (e.g. SQLite) and then others see it, some start reading more about X on their own and digging up further articles, and inevitably a few will post those other articles they've found as well. Rinse, repeat.
Age of article also has little to do with being on the front page of HN. There are articles that have been submitted, and made it to the front page, dozens of times from 2010 to today, and their original publication date can be half a century old (eg. some old Lisp articles, Djikstra articles, "Worse is better", etc.).
In some way it is the Platonic Ideal of what a SQL database should be (from a a developer's perspective). There isn't any mucking about with dev-ops configuration, because the DB is file based. Just type "sqlite3 out.db" and you have a database to write queries against. At the same time it still manages to be quite performant.
SQLite is 35% faster reading and writing within a large file than the filesystem is at reading and writing small files.
Most filesystems I know are very, very slow at reading and writing small files and much, much faster at reading and writing within large files.
For example, for my iOS/macOS performance book, I measured the difference writing 1GB of data in files of different sizes, ranging from 100 files of 10MB to 100K files of 10K each.
Overall, the times span about an order of magnitude, and even the final step, from individual file sizes of 100KB each to 10KB each was a factor of 3-4 different.
The better way to do this would be to have node access the modules in SQLite directly.
There is overhead in small file storage anyway, if the files are not the exact size, or a multiple, of the sector size. storing 1kb files on a disk where the sector size is 16kb is far more of an overhead expense than storing those files in an SQL database.
35% Faster Than The Filesystem (2017) - https://news.ycombinator.com/item?id=20729930 - Aug 2019 (164 comments)
SQLite small blob storage: 35% Faster Than the Filesystem - https://news.ycombinator.com/item?id=14550060 - June 2017 (202 comments)
That's what's happening here. storing files in SQLite removes any per-file overhead in the filesystem. The filesystem now only has one file to deal with, instead of however many are stored inside the SQLite database.
This is a very real phenomenon, and definitely not a joke.
The Windows file system is still as dumb as it was in NT 3.1 -- hardly any changes since then.
Should SQLite be modified such that its code, when compiled with a specific #define compilation flag set, be able to act as a drop-in filesystem replacement source code -- for an OS?
I wonder how hard it would be to retool SQLite/Linux -- to be able to accomplish that -- and what would be learned (like, what kind of API would exist between Linux/other OS and SQLite) -- if that were attempted?
Yes, there would be some interesting problems to solve, such as how would SQLite write to its database file -- if there is no filesystem/filesystem API underneath it?
Also -- one other feature I'd like -- and that is the ability to do strong logging of up to everything that the filesystem does (if the appropriate compilation switches and/or runtime flags are set) -- but somewhere else on the filesystem!
So maybe two instances of SQLite running at the same time, one acting as the filesystem and the other acting as the second, logging filesystem -- for that first filesystem...
Is this just a blatantly dishonest comparison, or did I miss something?
I came across the write cache bit accidently when trying to minimize wear on an embedded flash device. It was borderline I thought I had done something wrong.
Minimizing open/close and keeping your reads/writes close to sector/cluster size on many filesystems can produce some very nice results. As you can minimize the context switches from user space to kernel. In the case of adding in a 'db' layer packing probably helps as well as slack on small files is huge percentage wise of the total file. So you would be more likely to hit the file cache as well as any in built ones for your stack.
This was one of the reasons why SQLite was an easy choice.