How does checksumming help if the data is in cache and waiting to be written ?
For ex:
I have 1MB of data, i write it but it stays in buffer cache after written and when you do checksum you are computing the checksum on buffer cache .
On Linux you have to drop_caches and then read get the checksum to be sure. Now per buffer or file drop_cache isnt available as per my knowledge . If you are doing a systemwide drop_caches you are invalidating the good and bad ones.
What if now if device is maintaing cache as well in addition to buffer cache?
How do you know you put good data into the cache in the first place?
There's always going to be a place where errors can creep in. There are no absolute guarantees; it's a numbers game. We've got more and more data, so the chance of corruption increases even if the per-bit probability stays the same. Checksumming reduces the per-bit probability across a whole layer of the stack - the layer where the data lives longest. That's the win.
I was asking this thinking of open(<file>,O_DIRECT|O_RDONLY);
that bypasses buffer cache and read directly from the disk that atleast solves buffer cache i guess. The disk cache is another thing ie if we disable it we are good at the cost of performance.
I was pointing that tests can do these kind of things.
I'm using HP ZBook 15 G2 Mobile Workstations with ubuntu 14 LTS since last 3 months. The only thing i miss is SSD that i didn't choose in my customization. Has 16GB RAM, i7 . I switch between Win 8.1 and linux (something lxde instead of unity). The nice thing about this linux can run VMs and it is pretty fast and even containers work well. Also Win 8.1 ultimate has hyperv role so you can use that as a hypervisor as well.
Initially i had 64 bit 15.x but some unity programs used to crash when it released , so i installed something good.
I'm didn't find any problem with anything until now.
>>>I won't go into the details that led up to this event<<<<
Fix the problem and then go ahead and do whatever you want. Atleast you would be helping one other person.
One you have crossed the line, there are no rules,nothing to stop you. Problems are created by people or a group of people. Fix the problem and you will watch problems run
to the hills. Yes you will have to pay a price for all this.
If you move what while (topK--) into the map loop , it becomes an online code for topK whereas what you wrote is an offline . If you want offline then pushing it into a priority_queue and then popping it out would be much faster.
On Linux you have to drop_caches and then read get the checksum to be sure. Now per buffer or file drop_cache isnt available as per my knowledge . If you are doing a systemwide drop_caches you are invalidating the good and bad ones.
What if now if device is maintaing cache as well in addition to buffer cache?
Can someone clarify ?