Hacker News new | past | comments | ask | show | jobs | submit login
Why sqlite3 temp files were renamed 'etilqs_*' (2006) (github.com/mackyle)
396 points by bsmith89 on June 12, 2023 | hide | past | favorite | 139 comments



I have a related story:

Around the year 2000 I was working operations in the NOC for WebTV (then owned by Microsoft). For those who don't know, WebTV was a little set-top box with a modem which would dial up on demand and provide a very basic web/chat/email experience on the TV. The box would call a 1800 number to figure out its own phone number, then re-dial on a local toll-free number with a local sub-contracted ISP.

One of the services we had would periodically send a UDP datagram out to online clients to let them know they had new email. The settop box would then light up a little indicator light.

Of course, sometimes the client would hang up. The IP might get allocated to a PC dialup user. And sometimes, that PC dialup user might be running a firewall that was popular back then, called BLACK ICE DEFENDER.

BLACK ICE DEFENDER had all these (not so) cool features, the kind that semi-technical people love. For example, it would log ATTACKS. What are ATTACKS? Unrecognized traffic, of course.

Sometimes the little UDP datagram for our "you have mail" service would be delivered to a PC user running BLACK ICE DEFENDER, which would register it as an ATTACK. It would then ever so helpfully look up the ARIN contact information to see who sent the errant datagram -- which had the NOC phone number. It would then tell the user "THIS ENTITY IS HACKING YOU" and imply that contacting them would be productive. Yes, you could pick up a phone and call the Microsoft NOC. Back then, the internet was a smaller place.

My job was to check the NOC voicemail, which was reliably filled with very angry people. Often they would threaten that they've reported us to the FBI or somesuch, or that it confirmed some conspiracy theory or another. We played the good ones on speakerphone for entertainment.

Good times. Doesn't happen anymore.


Great story :)

The same year I was working on various online music stores using Microsoft's Windows Media DRM. This would cause a licensing window to pop up in Media Player all the time when the license was missing or expired for someone's music. We would get various complaints emailed to us, and being head developer sometimes the thornier ones would end up in my inbox, and my friendly ass would be kind enough to reply and try to figure them out.

One time one of the senior execs was walking past my screen and peered over my shoulder to read a new email which said "EVERY MORNING YOU ARE ON MY COMPUTER GET OFF MY COMPUTER!!!". The exec leaned over, typed "GO FUCK YOURSELF" and hit reply.

The benefits of being the boss...


Microsoft ends up, wrongly, in the firing line for a bunch of things, similar to the story you told and to the sqlite_ comment in TFA.

Notice that on iOS (for example) if an app crashes it just disappears > poof! < and it’s gone.

On Windows when an app crashes (or you kill a non responsive app), you then see a dialog with Microsoft Windows branding saying that it is logging that the app crashed. As nerds we understand why… but the result is that the user doesn’t curse the app, they curse Microsoft. Whereas when an iOS app disappears they curse the app, not Apple.


macOS is a better comparison to Windows than iOS is, since iOS is just for iPhones and not desktops. macOS will show you the stack trace of a crashed app in a dialog. This allows more technical people a chance look into the reason for the crash. Users have the option to share the stack trace with Apple.


Does windows show the stack trace? I’ve been very appreciative of it in osx as it’s helped me identity how to avoid some


I can't speak for the latest version, but previously versions did. If the program had debug symbols in it, it would show you pretty useful info and would even let you debug it! The debug feature rarely worked though


Windows often (or can be set to almost always) logs dumbs of a program that crashes, and you can open it up in WinDbg and see stack traces and things.


The people who left those voicemails are now the ones leaving very odd and paranoid posts on GitHub issue trackers. There's a few unfortunate souls out there convinced that something called Lighthouse is trying to hack them.


The developer behind curl still receives abuse from confused and crazy people despite being an objective force for good in the software world.

https://daniel.haxx.se/blog/2021/02/19/i-will-slaughter-you/ is proof that there are people out there whose mental illness will result in threats on your life (at least this specific person apologized later: https://daniel.haxx.se/blog/2021/08/09/nocais-apology/).

It's difficult to blame people for their mental illness, but a valid excuse doesn't make dealing with this crap any better. At least paranoid Github issues don't usually come with death threats...


It's difficult to blame people for their mental illness, but the damage they do as a result of their actions is real, and while you can forgive you should never forget.

I got romantically and sexually involved with a person who had schizophrenia, they would occasionally make very wild and inaccurate accusations toward me, and when I broke it off they began stalking my friends, family, and me. At some point, the episode they were having ended and they stopped, and later, she said they forgave me for what I did (which was nothing).

While I sympathize that some of the emotions they were feeling were out of their control, the act of buying airplane tickets, renting cars, and harassing friends, family of mine, and me IRL was over the top and frankly, imo, unforgivable even with a diagnosis of mental illness.


I always imagine them as being a lot like Ron from GTA5.


Holy moley, my first “real” computer ran 98SE and had BLACK ICE, which behaved exactly the way you described.

I remember I would take the IP addresses of the “attackers” and plug them into a McAfee graphical tracert tool, and it felt like nothing short of something out of GoldenEye.

Thanks for the memories!


Similar story: Friend and I put out a kernel driver (uxstyle.sys) that would patch Microsoft's theming digital signature checks. It was free, buggy, and bugchecked the OS on upgrade. It was unsurprisingly added to compatibility blocks in Windows. I fixed the bug and asked Microsoft to loosen the block (version X and below). Microsoft refused citing a EULA violation. Valid or not, I renamed the driver to elytsxu.sys to circumvent their check and the app worked well enough until third-party theming fell out of favor.


On the rare occasion I install a Windows XP VM nowadays, I still patch uxstyle.dll and install my favorite style: DeMx[0].

[0]: https://www.deviantart.com/xandaman/art/DeMx-29075379


I often search for weird files in my %userprofile% (there are a lot random ones) just out of curiosity, despite I know they're not malicious.

It doesn't help that if you Google any filenames, or even any semi-obscure file extensions, there would always be plenty of blogspam articles saying they're "possible virus". And oftentimes, there is no legit article to say what they really are even if you try, if they're from some relatively less popular software.


Could put the file(s) on a linux machine and run `file` on them


I expect that'll usually just give you a lot of "data" (i.e. some binary format that file doesn't recognize)



I don't know, the more time passes the more I convince myself that the net benefits of antivirus software do not (an maybe never) exceed their downsides. In decades I've heard so many stories about AV software behaving suspiciously, using borderline shady tricks to monitor user activity, causing severe performance degradation, etc


> (an maybe never) exceed their downsides

There were certainly times when they were necessary: when Windows had nothing built-in to defend itself, and for a time after then when those built-in features were crap.

Those times are pretty much over now IMO. I'd go as far as to suggest that the market is now an attempt at a protection racket and hardware hawkers are complicit: things come pre-installed on new laptops and make very misleading claims about what might happen if you uninstall them instead of subscribing after the free trial period (ref: Dad got a new laptop recently, I went through and removed all the junk included with it, I can see why people with little technical experience might just pay up).


> There were certainly times when they were necessary

Around the time of Windows 7 I stopped using anti-virus software and nothing happened. And for a long time before that, paid antivirus software (ESET NOD-32) wasn't finding anything. I think the simple rules of having a router with a firewall, not clicking random files, not using Internet Explorer, and keeping Windows up to date covered 99% of possible exploits, and the other 1% was luck.


Windows 7 was the first version that Microsoft was allowed (by the US/EU antitrust decisions) to have Defender installed out of the box in Windows. They were trying to do that as far back as XP and "Security Essentials" (the name before Defender) was a free and easy download on XP and Vista, but without the protection racket style marketing it was easy to miss and not everyone realized that there was a nice, quiet Windows anti-virus tool from Microsoft just a click away. I still think bundled anti-virus was one of the stupider things the antitrust decisions blocked and it gave the protection rackets a few more years to solidify power that they didn't need.


Actually an antivirus will only help if you already have malware. (Or if it scans downloads/websites) but it will, most of the time, not prevent new malware.


The fact that a computer bought from a store, that you pay good money for, comes with this crapware installed shows that even when you do pay, sometimes you are still the product not the customer.


How about that time Avast gave you RCE by simply adding HTML to the CN field of an invalid certificate?[^1] Or when TrendMicro added an unauthenticated listener that would exec anything you sent it?[^2]

[1]: https://www.theregister.com/2015/10/06/google_zero_hacker_re... [2]: https://bugs.chromium.org/p/project-zero/issues/detail?id=69...


A couple years ago I heard a podcast that referenced a study that polled 2 groups of people; those that (admittedly, SELF) identified as "security professionals" and those that did not.

The data point was "what is the biggest thing your group does that the other does not?"

The professionals came in as "multi-factor auth" The non-professionals came in as "anti-virus"


There was a time when they made sense, but it's been many years since the benefits of running a third party antivirus program outweighed the drawbacks.


I remember our Java developers being very unhappy with ESET requirement that made the Linux boxes compile performance literally halve


Seemed to be a win-win for Windows and the AV companies. Windows would let cheap interns pump out code (autorun.inf) and anti-virus companies would throttle your computer looking for anything exploiting the garbage code. Windows Vista was like the high-water mark for a slapped together shit OS with its AV "protection" — "that place where the wave finally broke and rolled back".


There’s a deeper economic lesson in here. Essentially it could be summarised as: government cannot leave policing to the market.

In modern capitalist democracies with social benefits there’s a lot of debate/opinion about what the state should do versus leave to the market.

Imagine you establish a successful little utopia on an island somewhere. At first you outsource policing to a few strong and forthright friends, then as the place grows you let the market decide. But as the place really matures you end up having to bring policing “in house”, controlled by the state. There is still plenty (plenty!) of ways it can be corrupt… but when anti-virus / policing etc is fully outsourced - over time its corruption levels completely saturate.

Similarly - a military power that relies on mercenaries/soldiers of fortune/contracted parties. It cannot work beyond a very short term. (Machiavelli goes into this in a lot of detail in ‘The Prince’ … he was super evil but he was no fool.)

Seems strange to say that Bill Gates should’ve read more Machiavelli and he would’ve avoided creating the leeching anti-virus industry for so long.


Love it.

Too bad there aren't enough Mac users to prompt a similar backlash against Macs littering every computer they visit on the network with .DS_Store and other turds.


This isn't remotely comparable. Those .DS_Store files are created in arbitrary directories by the Apple file manager or something. The SQLite temp files are created in the OS-specific temporary directory (e.g. C:/Users/username/AppData/Local/Temp or whatever on Windows) which is specifically intended for that purpose. SQLite isn't doing anything wrong; that's where it's supposed to store temporary data that doesn't fit into memory.

The problem is that virus scanners sometimes misclassify those temp files as belonging to malware apps, or sometimes they might be written by real malware apps, but even in the latter case, that only happens because the malware uses sqlite as a library. The malware isn't written by the SQLite authors, so complaining to them is pointless.


You misunderstood what the comment in the code was saying.

Firstly, it not that anti-virus software was misclassifying it -- it's that a particular one, Mcafee, was USING sqlite. And more importantly, it didn't put the file(s) in the proper %temp% folder. Instead, it created it as `C:\temp\{name}.sqlite_`. This is why users found them suspicious and complained to devs.

Also keep in mind putting random files or folders under root of C:\ was, while never recommended, a common practice in 2000s (some still do so even today). So what Mcafee was doing is hardly unheard of. But people still freak out when they see these in their C:\ (TBH, understandable).

> %userprofile%


They're not arbitrary at all. They're in directories visited by Finder, storing the open/closed state of subdirectories, and apparently other view data.

Apple, in its arrogance and backwardness, doesn't store those choices on the BROWSING user's computer; it stores them in the directories on the computer being browsed... where, by the way, they'll be trounced by the next Finder user who comes along.


KDE's Dolphin does the same thing with .directory


> Those .DS_Store files are created in arbitrary directories by the Apple file manager or something.

.DS_Store files come from Apple's file systems containing a separate data and resource fork. APFS can natively store the contents, but for foreign file systems/network shares, a .DS_Store file is created to store those attributes.


I don't have OCD but the casing of .DS_Store annoys me a lot for some reason. I have turned on option to display dot files in finder and I see this god awful name everywhere. It could be .DSStore, .ds_store, .DS Store or even .DS_STORE, current one is the worst.


> .DS_Store files come from Apple's file systems containing a separate data and resource fork

No, that would be ._<filename> files (containing resource forks, when the corresponding file is stored on a volume with a FS not supporting them natively), which are much rarer these days given that resource forks are not really used anymore.

.DS_Store contains Finder metadata, as far as I know (non-default icon sorting or positioning, categories, tags etc.), and they occur on all volumes, including Mac native filesystems. Finder just hides them by default.


I never knew that. How come they end up in Git repositories then? They shouldn't be visible to Git running on a native Mac filesystem?


They're just a file, so git can see it. Good habit would be to add it to the project's gitignore, or your global gitignore (or both)


It's a pet peeve of mine when people put stuff like this in a project's `.gitignore`. The proper place is either your global gitignore (probably appropriate for stupid stuff like this) or in your clone in `.git/info/exclude`.

`.gitignore` is for stuff that all developers need to ignore, like compiler output, and should be kept to the bare minimum.


> .gitignore` is for stuff that all developers need to ignore

But that is the case, all developers need to ignore the ".DS_Store" files. So by your own rule it is appropriate in there then?


I think the point is that only developers on Macs need to ignore them. If they properly do so, these files will never appear on other developer’s machines, and they therefore don’t need them in .gitignore.


It’s not that straightforward to ignore files in git without adding them to a local .gitignore, though.

Global .gitignore exists, but I just had to look it up again to refresh my memory – chances are, junior devs or hobby developers will never even consider it as an option.

That said, arguably ignoring .DS_Storage should be the default on macOS builds of git.


I just teach people about global gitignores then they know. If someone were to commit something to an open source project, I'd reject it and teach them about it too. I'm all about the global gitignores. The ignore file can actually act as a form of documentation if kept tidy. It's a single source of truth of all the artifacts an app can produce.


he must meant that you block something at its origin: dsstore managed by the OS, you block it at system level; `.o` produced by the project stuff, block it at project repo level.


I must admit that my viewpoint is coloured by the fact that I work on repos where most devs use OSX.

I can do the principled thing as you describe it. And then I can be angry every time a new dev arrives who doesn't have their computer correctly set up yet, or someone reinstalls their machine, or anything like that.

Or I can do the pragmatic thing, add one extra line to the .gitignore file and forget about it forever.

In my opinion this is not a hill worth dying on.


I can see an argument for this being semantically correct, but in practice I think it's preferable to have the repo checkout in the closest configuration to correct for new developers who will start contributing.

And, for what it's worth, all developers need to ignore .DS_Store files regardless if Finder creates it or it gets placed in when you unzip something created on a Mac, for instance.

In the best case, you spend extra time needlessly denying PRs and bothering people who already likely don't know git well enough to cleanly fix their commit. In the case of a repo large enough that you're not the only person approving PRs, I'd say it's almost inevitable that they eventually slip through.

Just my 2¢ from experience working in the industry.


TIL there's a global gitignore, thanks!


So they are not a resource fork, then.


They are visible to programs, for example if you run `ls -a` you will see them on macOS too. I reckon they're only transparent to Finder (where you can't see them even if you're showing hidden files).


.DS_Store files are totally out of control: https://twitter.com/rdohms/status/1667554818603245570


You mean like those thumbs.db files windows leaves on every computer they visit on the network?


Windows XP is the last Windows version that does this.


Not to mention that I think you'd have to be browsing in some non-list view for it to even be generated.


You can avoid creating .DS_Store files on network and USB volumes like so.

  defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool true
  defaults write com.apple.desktopservices DSDontWriteUSBStores -bool true


Good tip, thanks.


.DS_Store has a origin story though not as entertaining https://www.arno.org/on-the-origins-of-ds-store


Apple's perennial lack of interest in truly fixing Finder shows in the fact that they've left this ridiculous bug in there for 20+ years, despite yet another "ground-up rewrite" that was supposed to (or did) happen at some point in the 2010s.

My other favorite Apple filesystem cock-up is the inclusion of a "Contents" directory in every bundle, which never has any siblings. I mean... what kind of ass thinks that a directory needs a subdirectory to hold the contents of itself?

The one thing Apple fixed in Finder that annoyed the living #$%! out of me for all previous years was that you couldn't sort files WITH FOLDERS AT THE TOP. Of course the fix was incomplete, broken in some places at first, including your Desktop directory. But that did get fixed surprisingly quickly.


That's the fault of whoever owns the other computers. If a user ever littered files all over my computer I would revoke write access until they fix their shit.


  desktop.ini


Software is at least a much social as technology.


Why not to use sub-directories inside a temporary directory instead of file name prefixes?



The relevant snippet:

   /*
   ** Temporary files are named starting with this prefix followed by 16 random
   ** alphanumeric characters, and no file extension. They are stored in the
   ** OS's standard temporary file directory, and are deleted prior to exit.
   ** If sqlite is being embedded in another program, you may wish to change the
   ** prefix to reflect your program's name, so that if your program exits
   ** prematurely, old temporary files can be easily identified. This can be done
   ** using -DSQLITE_TEMP_FILE_PREFIX=myprefix_ on the compiler command line.
   **
   ** 2006-10-31:  The default prefix used to be "sqlite_".  But then
   ** Mcafee started using SQLite in their anti-virus product and it
   ** started putting files with the "sqlite" name in the c:/temp folder.
   ** This annoyed many windows users.  Those users would then do a 
   ** Google search for "sqlite", find the telephone numbers of the
   ** developers and call to wake them up at night and complain.
   ** For this reason, the default name prefix is changed to be "sqlite" 
   ** spelled backwards.  So the temp files are still identified, but
   ** anybody smart enough to figure out the code is also likely smart
   ** enough to know that calling the developer will not help get rid
   ** of the file.
   */



Or the link to the official repository: https://www.sqlite.org/src/file?ci=trunk&name=src/os.h&ln=57


>> So the temp files are still identified, but anybody smart enough to figure out the code is also likely smart enough to know that calling the developer will not help get rid of the file.

Building illuminati lairs now, are we!


It's called barrier to entry. The world was better when there was a higher barrier to do things like, e.g., buying a plane ticket or getting on the internet. The word for how people who had the means and/or knowledge to do these things and comport themselves appropriately under the circumstances they encountered was "class".


Right, it was better when only those who deserved it could fly or get on the internet.

And the definition of “deserved it” is “having more money than everyone else”.

Who cares if that money was made by exploiting people or selling shonky goods or breaking promises or tricking people or otherwise being an asshole?

Go on, take that flight. Buy that shiny thing. You worked hard for that money. You’re better than they are. You deserve it.

/S


I disagree. What made eternal september so painful was lack of intelligence, not lack of money. It is only a correlation that poor people also tend to be less educated. I don't care if my chatroom is flooded by people with just a little money, but I feel it very much if average IQ drops by 40 points, suddenly.


IQ points only indicate how good people are at doing IQ tests (and that they actually did one), it doesn't tell anything else. And looking down on people with perceived low IQ is elitist and classist.


I agree with you that IQ tests are kinda useless. Also ,I think its a misconception that IQ tests test intelligence. Taking the test depends on many external and internal factors. Mood, how rested and sated you are, how comfortable you are while taking the test. Those things in turn affect your patience and your concentration, and ability to resist environmental distractions, etc, etc.

Often times experience can affect these IQ tests. One can easily warm up to IQ tests; I know you're not supposed to be able to, but it's totally doable. Do two ones in a row, and after learning the answers to the first one, you'll do loads better on the second one (after a break, so your concentration isn't depleted if you struggle with that).


Oh, /s indeed.

No, no. Getting on the internet wasn't based on having money for it. It was based on figuring out how to wire a 300 bps modem to a phone receiver. And to do that you needed a bit of a brain. And you likely weren't going to use it to share photos of your boo and shop for your next sugar-whatever, or share your dumb recipes. If you were capable of doing it, you might have valid opinions about what was going on in the world that would be worth debating or comparing with others... and anyway, there wasn't enough bandwidth for you to be both a blank, ignorant serf and to have your brain hijacked by masses of political operatives. No, the economy of scale just hadn't developed yet to deem the opinions of each individual member of the uneducated mass to be worth the effort of advertising to [each one] separately at the time.

[edit] Also, I'm assuming you can afford now to be on the internet and to take flights, so why can't you simply show the same amount of class that people showed when it was more difficult for them and they had to act a bit better to do so? Would that be too much to ask? Or is being a slob and intellectual degenerate some sort of proof that you're morally superior by way of being poor and not needing material things? You do realize that no matter what era we live in, the rich have nicer, shinier toys, right? What I meant by "class" was, as I said, the way you choose to comport yourself; it has nothing whatsoever to do with money.


You seem to be taking the position that somehow privilege or “class” can be improved by creating artificial barriers to entry, but I can tell you for a fact that even back in the dim dark days of dialup BBS, long before the IP protocol, there were assholes and trolls online. Technical skills only made them more arrogant.

A lack of class and comportment online is not new, and the shiny accoutrements and signifiers of “class” can as easily hide malign intent as demonstrate worth. (also: people shared all kinds of crap online, even then).

If you limit the internet to people who can wire up a 300 baud modem, then you’ll just end up with trolls who can solder.

I’m happy that everyone can fly, and can get online. How else are they going to learn about the world? I don’t have to hang out where they hang out if I don’t want to. That’s why I’m on HN and not FB. But they have every right to take advantage of these things, and I still have faith in humanity.

Anyway, without travel and information, how else are they going to learn that the world is filled with artificial barriers intended to keep them in their place?


I can see how the barrier to getting onto the internet might have been more intellectual than financial.

I am utterly at a loss about what you think the intellectual barrier to buying a plane ticket is.


Are you responding to me? Because I am responding to this:

> The world was better when there was a higher barrier to do things like, e.g., buying a plane ticket or getting on the internet.


Yeah sorry, I meant to respond to the other guy


100%


Looks like the line numbers were lost: https://github.com/mackyle/sqlite/blob/18cf47156abe94255ae14...

It's because McAfee started using SQLite, angry users would stumble upon the files, do a minimum of searching or thinking, and be furious at SQLite developers.



I've always wondered if Daniel gets a particular amount of email of this sort because his address includes "haxx"--for someone without tech savvy, it's not all that much of a jump to assume an email that includes something sounding like hacking is the email of the hacker that has attacked them.


Yes, that seems to be the likely explanation at least for the second linked email which closes with:

”Please see attached screen dump for a screen shot of your contact info included in Spotify (or what more likely seems to be a hacked Spotify developed illegally by a third party).”


The last one has to be an elaborate troll. Only a 4chan user would type "RIP Terry A. Davis" anywhere.


What makes it so that only 4chan users would have empathy for the death of other programmers? Terry's death was quite tragic and his work was often discussed in technical circles for it's novelty...despite his eccentricities. Love him or hate him, no human being deserved the circumstances of his life or of his death.

RIP Terry. A. Davis


Check the ‘apology’ follow up. The person claims to be schizophrenic. Even the apology mail makes no sense and is full of paranoia and read like a group stalking delusion.


I wonder why users were angry about some files in their temp folder. Did McAfee fail to cleanup those files, or were they too big?

Edit: more information here: https://www2.sqlite.org/cvstrac/wiki?p=McafeeProblem Apparently, McAfee kept those files locked when it was using them, so the files couldn't be deleted and people got angry that they couldn't clean them up. Sounds like a loud minority to me.


There's some level of power user that will try to fix things, not understand what's going on, and yell at the world when they break things further.

I used to have a popular freely available 3rd party DLL that was included in lots of software packages. Because I was silly, it had my email address in the metadata that'd show up if you clicked "Properties" in Explorer. I'd get plenty of emails from random people asking for and sometimes _demanding_ help with software I've never heard of. I'm sure if I had an easy to find website with my name and a forum on it, it'd be full of such angry comments.


> There's some level of power user that will try to fix things, not understand what's going on, and yell at the world when they break things further.

That appears to me to be quite like a Chesterton fence. Everyone encounters these eventually.

https://en.m.wiktionary.org/wiki/Chesterton%27s_fence


It's a good thing, that there's a clear instructional video on how to remove McAffee antivirus from your computer then:

https://www.youtube.com/watch?v=bKgf5PaBzyg

(yes, it's the funny one)


In Linux, these files are unlinked, so they are invisible in the filesystem. It is legal to unlink an active file descriptor, but continue reads/writes to it.

I think that lsof can still see these temporary files; I'm not sure how I first noticed it.

Windows implements a POSIX kernel layer, so perhaps this functionality could be coaxed out of it.


I knew two guys long long ago that used to circumvent the per user disk quota on shared machines, which was tiny and not conducive to power users. One found a deep dark corner of the file system, and used it to hold binaries the rest of us used. I think his idea was that if it wasn’t for personal gain it was easier to answer difficult questions, which at some point came up and he got a pass, since it would take more disk space if we had private copies.

The other was keeping file handles open to deleted files, as you describe. I don’t recall how this worked, but I suspect it involved uncompressing data into a file descriptor, then reading it back. I guess as long as his terminal window was open it was more stable than tmp (he may have also been using screen).


> I don’t recall how this worked

Unlinking files reduces their reference count. Once the reference count reaches zero the file is considered deleted and the space can be reclaimed. Every open file descriptor and hard link increases a file's reference count. So if you've got a file descriptor open on a file in a background task (daemon, nohup, screen, etc) and unlink it with rm or something the file's reference count will decrement but not go to zero. Only when that program closes the file descriptor will the reference count go to zero and the file actually be deleted.


I had an interesting issue with that the other day:

Someone on my company thought it good to launch background process from within a cron, and start it with `&ˋ at the end, so that it detached from the cron and goes as child to init .

Right, but the program opens std out and dumps its output in it. Guess what, the fd handling the std out ends up in a temp file which is deleted by cron, but still held by the program.

I ended up with a full / in all production servers, with no culprit when running ˋdu`. Only running ˋlsof |grep deleted` did I find these huge temporary files.

Killing the process and switching to a saner systemd service was the savior.


If a file is opened with `FILE_SHARE_DELETE` then it can be deleted while held open. This flag has existed since forever. It's just that unlike Linux this isn't the default, and no developer would think of setting it. I think the only common software I saw using it when I still used Windows was the media player mpv.


When youtube was flash based that is how flash would save it's cache file for the stream. to save a video you would go looking for what file descriptors were in use and yank them out of I believe /dev/fd and back into the filesystem. On windows the file was visible but locked behind the normal windows file locking shenanigans. To copy locked files on windows I used a program called hobocopy that utilized volume shadow tricks to work around the locks.


Apparently, recent Windows uses POSIX semantics by default: https://news.ycombinator.com/item?id=23745019


lsof shows “open file handles” which can include deleted files. Can be useful when you have a disk showing tons of usage but you can’t find any big files.


> Sounds like a loud minority to me.

Imagine for a moment the cross section of Windows users that 1) have McAfee running 2) poke around the Windows temp folders and/or run some sort of "cleaner" to "optimize" their system and 3) don't recognize the name "sqlite". I can imagine the exact type of user, peak Dunning-Kruger and utterly insufferable. Even just seeing their e-mails before deleting them would be maddening. I think most people on HN can vividly imagine the sort or remember back when they were that insufferable twat.


I was that person once. I was 12 and couldn't understand why MS-DOS disk maintenance tools shouldn't be used on PC-DOS because obviously repair tools can only fix things


That's interesting. Some DOS clones had a different FAT format?


Children really are peak Dunning-Kruger.


Somewhat related - I’m very very curious to hear a detailed account of someone who uses SQLite for a production app with high traffic.

For the embedded use case I think it’s a slam dunk, but there are many interesting use cases for server side but they all seem to be toyish.

The locking behavior of SQLite is somewhat problematic unless you use WAL and even then not perfect


> there are many interesting use cases for server side but they all seem to be toyish.

> The locking behavior of SQLite is somewhat problematic unless you use WAL and even then not perfect

SQLite with WAL and synchronous configured appropriately will insert a row in ~50uS on NVMe hardware. This is completely serialized throughput (i.e. the highest isolation level available in SQL Server, et. al.). At scale, this can be reasonably described as "inserting a billion rows per day".

I have yet to witness a database engine with latency that can touch SQLite. For some applications, this is a go/no-go difference. Why eschew the majesty of SQL because you can't wait for a network hop? Bring that engine in process to help solve your tricky data problems. I'd much rather write SQL than LINQ if I have to join more than 2 tables.

We've been exclusively using SQLite in production for a long time. Big, multi-user databases. Looking at migrating to SQL Server Hyperscale, but not because SQLite is slow or causing any technical troubles. We want to consolidate our installs into one physical place so we can keep a better eye on them as we grow.

Fun fact: SQL Server "Hyperscale" is capped at 100MB/s on its transaction log. I have written to SQLite databases at rates far exceeding this.


Is your application a networked one?


It is a web app that has exclusive ownership over its SQLite databases. Exactly one SQLiteConnection instance per for the lifetime of the application.


Does that mean you have to bring the whole app down if you need to manually insert something in sql?


I am not an expert, but if I had written an app that was designed like that, the app would have a /debug url that let me log in[0] and then do all kinds of fun things[1], including running raw SQL from the app's context.

[0] And I would be very careful about the security angle, because this is effectively a very dangerous back door into the whole system. Useful, but something to be careful about.

[1] Dump live activity stats, performance info, ability to run arbitrary queries against any database(s), ability to run arbitrary code in the app itself if available in a reasonable way (ex. if the app is in a dynamic language, just an eval(), possibly embed tcl/lua if not).


More SQLite processes can open the same file at a same time.


Interesting. Any more info or link?


A super smart guy on my team at a previous job replaced around $100,000 of server and SAN hardware used for a(n) (admittedly incredibly absurdly designed, well before our time) analytics system built using MySQL filtered replication triggering stored procedures in this magical Rube Goldberg-ian dystopia with a 3 node Flask app running with 2 cores and 4 GB of RAM each performing the analytics work JIT. The app would use in-memory SQLite3 tables to perform the same work as the stored procedure operations, and cost about 50ms of extra time per request for a feature of the app that was rarely used.

Admittedly, not high traffic like you asked, but one of my favorite uses of SQLite hands down.


Apps aren’t divided into “high-traffic” and “toys.” There are plenty of use cases where you have a low-write server in a production environment, and SQLite would work fine there.

If you need high write volume, then yes, the locking behavior means SQLite is not a good fit.


SQLite can easily hit 15k INSERTs per minute or more (setting processor affinity to a single core helps drive the max rate up). However, if a process begins a transaction and then stalls, it halts all dml.

I think performance can be good, as long as a competent schema design is in place. Allowing ad-hoc queries from less trusted users will surely tank performance.


Not a ton of detail, but Tailscale uses it: https://tailscale.com/blog/database-for-2022/


There are plenty of use cases of SQLite being used in production servers, usually for cache locality of small datasets (think more complex read-focused Redis use cases) or for direct data operations (transforms, for instance).

That being said; even if it weren't usable in a web service space, does that make it any less reasonable of a database? That whole mentality sounds like a web developer centric one. Berkeley DB was used for decades for application and system databases, a field that SQLite largely replaced it in. And one that MySQL, Postgres, Oracle, etc are generally completely unsuited for. It's the same reason Microsoft offered MS Access for so long alongside MSSQL, until MSSQL had it's own decent embeddable option to deprecate it.


Cloudflare's D1 is built on top of it https://blog.cloudflare.com/introducing-d1/


I've been pushing to try SQLite as a "sacrificial memozation." Basically we have two tasks separated by 5-10 days. When we do the first task, we calculate a bunch of information as a "side effect". At the second task, we don't have that information and trying to reconstruct it without the original context is very slow, because a lot of it is dependent on temporal state-- what was happening 5-10 days ago.

The other use case I'm eager to explore is as persistence of status data in a long-running process. Occasionally the task blows up halfway and although we can recover the functional changes, we lose the reporting data for the original run. If we save it in a SQLite database instead of just in-script data structures, we don't have to try to reverse engineer it anymore.

In both cases, I like the idea of "everything's one file and we nuke it when we're done" rather than "deal with the hostile operations team to spin up MySQL infrastructure."


redis could possibly work, too, depending on how you actually plan to do it


You should look into temporal.io. Your workflow problem is highly suited to it.


Yes, the sqlite defaults are quite terrible out of the box. I'm not sure why they never changed them, it will start choking at 5k inserts where other dbs can do 100x that (and so will sqlite in wal and a few other settings). Getting it to perform well in high traffic scenarios would be a lot of effort. I struggle to get it to be vaguely performant in embedded use cases and often roll my own poor man's version unless I really care about data integrity (which is rare).


> I'm not sure why they never changed them

Because SQLite is run in many different environments and scenarios, and what's terrible in one scenario is perfect for another. There are no defaults that will work for everyone. This also applies to MySQL, PostgreSQL, etc. but the range of scenarios for those is more limited (no embedded for example) so the defaults are a bit more tuned to what's suitable for your scenario.


Isn't it more rare to not care about data integrity? Being unsure what state your data is at any point in time does not seem like a safe scenario.


High-volume single-writer + WAL mode is perfectly doable for a lot of applications, but you have to keep the concurrency model in mind. I think of it more like a really advanced data structure library than a "database" in that sense.

An old product I worked on (which is still around) used (still uses?) SQLite for storing filesystem blocks that came from the paired backup software. A single database only contained the blocks backed up from a single disk (based on its disk UUID.) So, this is a perfect scenario since there's only one writer at a time because there's only one device and one backup on that device at a time. We could write blocks to SQLite fast enough to saturate the IOPS of the underlying device and filesystem containing the database. This was over 10 years ago. Very practical and durable and way better than anything we could do in house.

Today, you can definitely do GB/s on NVMe class hardware with the right pragma settings and the right workload. So for certain classes of multi-tenant solutions I think it's not so bad, you can just naturally have a single writer to a single data store (or can otherwise linearize writes) and the raw performance will be excellent and more than enough.



Do they have any customer success stories about sqlite? I've mostly seen (compelling and well written) marketing posts. Details from someone using the product are much more interesting.


From what I can tell I haven’t seen anything on their end customers using it much


Wasn't there an article on here within the last week saying how SQLite is "perfect for the 'edge'?"


That was more of the vein of toy example/hypothetically


This is a bad choice, I wouldn't understand that it needs to be read backwards. Why not use process's executable name instead?


Did you even open the link?

Human readability was not a goal, in fact quite the opposite.


Its a horrible link. It points to a specific commit...Okay, so it must have something to do with the commit? Nope, the commit is putting a limit on the number of symlinks to resolve for whatever reason. Then you also notice that the link is simply a link to the file, not a specific line (something like https://github.com/mackyle/sqlite/blob/18cf47156abe94255ae14...), so you are expected to read the whole C file to figure out why.


or you just ^F and search for the string you're looking for and enjoy a good laugh :)


I meant use the name of the program that embeds SQLite, for example, McAfee, Google Chrome etc. This way the user could easily understand which program has created the files.


How does it get that in a cross platform way? Or want if the program name has exotic characters?


I think that in 2023 every decent OS should provide a method to get executable name. And every decent filesystem supports exotic characters.


Why spend time and effort on all of that when applications can just configure it themselves if they want to?


If it were possible to determine the program name in a way that was portable and not too painful, it would be a nice feature for the library to automatically set a better default, both to save work for devs using it and to save sqlite devs the hassle of their library getting blamed for things that aren't its fault. Now, I don't think those conditions are likely to be met, but that doesn't mean it wouldn't be nice if it were practical.


> If it were possible to determine the program name in a way that was portable

Every decent OS in 2023 should provide a method for this.


>Every decent OS in 2023 should provide a method for this.

Yes, that's rather the problem—there isn't just one way to do that, which makes it a portability issue.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: