

OS X LevelDB Corruption Bounty: 10.00 BTC + 200.2 LTC - cypherpunks01
https://bitcointalk.org/index.php?topic=337294.0;all

======
pudquick
... are Apple's manpages never read?

[https://developer.apple.com/library/mac/documentation/Darwin...](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html)

"For applications that require tighter guarantees about the integrity of their
data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the
drive to flush all buffered data to permanent storage. Applications, such as
databases, that require a strict ordering of writes should use F_FULLFSYNC to
ensure that their data is written in the order they expect. Please see
fcntl(2) for more detail."

[https://developer.apple.com/library/mac/documentation/Darwin...](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fcntl.2.html)

"F_FULLFSYNC - Does the same thing as fsync(2) then asks the drive to flush
all buffered data to the permanent storage device (arg is ignored). This is
currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF)
file systems. The operation may take quite a while to complete. Certain
FireWire drives have also been known to ignore the request to flush their
buffered data."

OS X has aggressive file buffering in memory, and it's getting more aggressive
all the time. For example, cfprefsd, introduced in 10.8
([https://developer.apple.com/library/mac/releasenotes/DataMan...](https://developer.apple.com/library/mac/releasenotes/DataManagement/RN-
CoreFoundationOlderNotes/)) made it so that when a system application read a
preferences file, it stayed in memory and ignored the disk version, until
cfprefsd eventually synced it back to disk. In 10.9, the behavior is _much_
worse to the point that as soon as a pref is in cfprefsd, it's unlikely to
leave it until the user logs out / the machine reboots.

In this instance, OS X has, for quite some time, had "defrag on the fly" for
files under 20MB in size. On access of the file, it's read into memory and
kept there in its entirety until memory pressure from other processes triggers
a sync it back to disk. When it comes to writing a small file back to disk, OS
X will "get around to it" when it's damned well ready unless you force its
hand using the fcntl options above.

Unfortunately, the bit about _" This is currently implemented on HFS, MS-DOS
(FAT), and Universal Disk Format (UDF) file systems"_ covers pretty much the
range of filesystem types that OS X can natively read+write on - but one that
might get past this is ExFAT. I'd be surprised if that was the case, but it is
natively supported read+write on OS X and would be something quick and easy to
test (set up an ExFAT volume for the database) and possibly verify this is the
root cause.

(Additionally, third-party read+write access to filesystems like NTFS via
Paragon / Tuxera may be able to confirm this as well.)

More reading material (MySQL has been dealing with this since 2005):
[http://lists.apple.com/archives/darwin-
dev/2005/Feb/msg00072...](http://lists.apple.com/archives/darwin-
dev/2005/Feb/msg00072.html)

~~~
caf
This appears to be benchmark gaming - the POSIX Rationale for fsync(2) says:

 _The fsync() function is intended to force a physical write of data from the
buffer cache, and to assure that after a system crash or other failure that
all data up to the time of the fsync() call is recorded on the disk. Since the
concepts of "buffer cache", "system crash", "physical write", and "non-
volatile storage" are not defined here, the wording has to be more abstract._

~~~
antirez
In other words, what the osx default fsync() semantics is useful for? I had
the same discussion on Twitter a few days ago...

~~~
Someone
It is useful for forcing all writes out to the storage device. If you device
is battery/UPS backed, has enough capacity to flush its buffers (to disk or to
flash memory) after a power loss, that is sufficient to (eventually) get your
data on disk (yes,the drive may fail, but if that happens after the data has
hit the platter, you have no guarantees, either)

From what I understand, that behaviour is in spec (for me, borderline, at
best, but I don't make that spec) according to
[http://pubs.opengroup.org/onlinepubs/009695399/functions/fsy...](http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html)
( _" physical write from the buffer cache"_, not _" physical write to the
disk"_) and, AFAIK, is what others do, too
([http://ridiculousfish.com/blog/posts/mystery.html](http://ridiculousfish.com/blog/posts/mystery.html))

Edit: [http://lists.apple.com/archives/darwin-
dev/2005/Feb/msg00072...](http://lists.apple.com/archives/darwin-
dev/2005/Feb/msg00072.html), referenced from that ridiculous fish post, gives
more background info.

~~~
antirez
Ok makes sense in special cases indeed, however a really unsafe default...

------
guyht
Its truly amazing to see such a large bounty for an open source project. Even
though this bug poses no security threat, the reward offered is akin to those
offered by the big power houses (FB, Microsoft, Google) for providing fixes to
critical security issues.

~~~
nwh
It's an infuriating bug, there's no real reason for it, and no pattern as to
what triggers it. Some users see it daily others never see it. It's been
around long enough to seriously irritate some users, hence the bounty.

~~~
nullc
In particular, it seems to not manifest itself for the technical folks who are
likely to actually solve it if they can reproduce it and yet be quite frequent
for others.

I wouldn't be shocked if it ultimately turned out to be due to some setting
that gurus would never have enabled. :)

------
jliptzin
stack overflow should allow you to attach BTC bounties to high priority
questions like in this thread

~~~
maaku
Monetary incentive actually decreases quality, in situations like this. It
devalues altruistic contributions (see: _Drive: the surprising truth about
what motivates us_ )

~~~
_delirium
I've definitely found that for myself, especially for smallish amounts of
money. If it's enough money that I can justify it as freelancing, then it's a
different category entirely: I'll do a good job in return for being paid. But
I'd rather do things I'm interested in without any money involved (writing
Wikipedia articles, answering questions I know something about, etc.) than
chase $2 bounties. That ends up making it feel not like an interesting hobby,
but like a low-paid job, like spending all day on Mechanical Turk.

As a proprietor of such a system you also end up with a whole category of new
bad behaviors, as people try to maximize their hourly pay by finding ways to
game the system, getting the most payout for the least input.

~~~
maaku
One thing that does work is turning it into a game: non-monetary reputation
points, badges, etc. No surprise, this is what stackoverflow has pioneered
(and Wikipedia should take note).

~~~
_delirium
I actually tend to find those pretty demotivating/bad-behavior-inducing as
well. Reputation in the actual sense is one thing (getting a reputation for
being a good contributor), but I really dislike karma/points-style
"scorekeeping" and up/downvotes. Fortunately on HN nobody takes it seriously,
or it could be a problem.

Wikipedia does have miscellaneous scores and badges, of which I find the
badges awarded by community members as recognition of a contribution most
useful:
[https://en.wikipedia.org/wiki/Wikipedia%3ABarnstars](https://en.wikipedia.org/wiki/Wikipedia%3ABarnstars)

There's also just raw counts of contributions, which everyone takes with a
large grain of salt:

[https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_...](https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_article_count)

[https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_...](https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits)

You can also click "thank" next to specific edits, which just sends the person
a notice that someone appreciated their edit. That I think is useful, but I'm
not sure how useful it would really be to keep a score of "number of thanks"
or whatever. The point is just to say "hey someone noticed you did a good job
here and appreciates it" to give some encouragement, not to keep score of who
got thanked more.

And finally many people just collect vanity "hey look at what I've done!"
lists, which can be a nice way of reflecting on your contributions and feeling
good about them. Many people's User Pages are like that, or you could do it
externally like e.g.
[http://www.gwern.net/Wikipedia%20resume](http://www.gwern.net/Wikipedia%20resume)

------
throwwit
Just checking: is it normal for the bitcoin-qt client to upload a burst of 1GB
to an IP on startup? Havn't seen it b4

~~~
locusm
There are wallets around that use a shared blockchain too. Otherwise the
download can be huge.

~~~
brymaster
cryptocoin, I have 'showdead' on and you're hellbanned. Welcome to HN!

~~~
cryptocoin
I have no idea what that is but I will be glad to leave HN if that is the
case, you're managing to do worse than bitcointalk.

~~~
nwh
You're no longer hellbanned.

------
pirateking
Since the issue states that all reports seem to be from OS X 10.8.X on, I do
wonder if it is related to changes that were introduced around 10.8[1][2].
Could FileVault or default Gatekeeper permissions also be involved?

I don't know much about LevelDB or the Bitcoin client, but I am currently
taking a closer look at OS X first.

[1]
[https://developer.apple.com/library/mac/releasenotes/General...](https://developer.apple.com/library/mac/releasenotes/General/APIDiffsMacOSX10_8/Kernel.html)

[2]
[https://developer.apple.com/library/mac/releasenotes/macosx/...](https://developer.apple.com/library/mac/releasenotes/macosx/whatsnewinosx/Articles/MacOSX10_8.html#//apple_ref/doc/uid/TP40011634-SW4)

~~~
archagon
I don't know much about this issue, but just a shot in the dark: Apple is
using a new technology called Core Storage that's "layered between the whole-
disk partition scheme and the file system used for a specific partition" in
order to manage FileVault and their fusion drives. It's also apparently a
technology that's in flux, as there are certain incompatibilities in regards
to fusion drives (possibly custom created ones?) with the release of
Mavericks. John Siracusa mentioned this in a recent podcast, and even
hypothesized about how Core Storage might be a stop-gap between HFS+ and a new
Apple-designed file system. Could it have something to do with this?

------
Moral_
Why don't they just go directly to the ioctl instead of through all these
libraries?

When ever I need something written to disk _immediately_ I go straight to the
drivers:

    
    
      	if (ioctl(fd, BLKFLSBUF, NULL))		
    		perror("BLKFLSBUF failed");
    

that should work.

