So is this 98% with stained images from the data set? I thought CNN's were data hungry with my limited knowledge. 98% of ~400 images seems fairly impressive. But I wonder about how well it will perform with unseen images.
Tangentially, there seems to be little news on new opensource data sets for anything. Saw Google do a few different sometime back. Do research companies only care about advancing techniques so they can use their private dataset to reap all that public research? Or is there genuinely no way to create data sets? For example, is it financially/ technologically impossible to make an "ImageNet" of cells? (Or maybe a lot of data sets are coming out and I am just unaware of them.)
it is very expensive & time consuming to create a vast amount of properly labeled image cell dataset. In general you need >2 pathologists to confirm the cell types (they disagree sometimes and you usually take the majority vote); this almost never happens with cat images;) Also there exist a multitude of device acquisition modalities for image capturing in microscopy, different stains for the same types of cell, etc. & actually simple RGB cameras are considered fairly low tech for these kind of operations.
ps. I am no deep l expert (i use more 'traditional' ml) but as you pointed out ~400 images for these techniques can be an 'overfitting' recipe of disaster..
Would something like images of rat cells to train to than transfer learn on humans be worth while? The author of the article tried it with ImageNet and it didn't work out. But I wonder about the viability of that techniques with non human cells.
well the same principle as with the VGGnet applies here too. If the rat images differ 'significantly' (whatever that means; i am a research engineer not a pathologist) then you will have nothing to transfer. Maybe it would be more fruitful to try to transfer via a huge amount of artificially generated cell images (there are toolboxes for that and its not linear transformations like rotation etc.) blended with some subset of vggnet (or similar) trained only with 'circular' objects ..
Tangentially, there seems to be little news on new opensource data sets for anything. Saw Google do a few different sometime back. Do research companies only care about advancing techniques so they can use their private dataset to reap all that public research? Or is there genuinely no way to create data sets? For example, is it financially/ technologically impossible to make an "ImageNet" of cells? (Or maybe a lot of data sets are coming out and I am just unaware of them.)