Hacker News new | past | comments | ask | show | jobs | submit login

I'm not sure if you're objecting to the name "Datashader", but surely every library needs a name, and this one is accurate in that it allows the sort of shading that one does for 3D rendering to be applied to 2D data plotting. Or are there other buzzwords used in the docs you find objectionable?



If I said I was an expert in 'big data visualization with billions of points' and had written my own 'out of core' rendering library that I dubbed 'data shader', complete with a paper where I coined the term 'Abstract Rendering' or 'AR' for short, then you found out that I was just reading points from disk and drawing them with opengl's draw points function, what would you think?

The term 'out of core' rendering comes from raytracing, where you really do need all the geometry available. They are applying it to trivial accumulation where it was never a problem in the first place. That's like me writing a paper on how to make a balloon air tight. That's how it has always worked, why would I take credit for something that was never a problem?


Sigh. Datashader is not a paper, it's an actual usable piece of software, so it should be compared to other tools and libraries for rendering data. Unlike nearly ever other 2D plotting library available for Python, it can operate in core or out of core, so it's entirely appropriate to advertise that fact (why hide it?). Unlike OpenGL's point drawing functions and nearly every other 2D plotting library available for Python, it avoids overplotting and z-ordering issues that make visualizations misleading (so why hide that?). Unlike NumPy's histogram2D, it allows you to define what it means to aggregate the contents of each bin (mean, min, std, etc.), to focus on different aspects of your data. It's a mystery to me why you think Datashader should somehow fail to advertise what it's useful for!


> Datashader is not a paper

https://www.semanticscholar.org/paper/Abstract-rendering%3A-...

You keep defending the project as a whole while not confronting the fact that they are touting rendering breakthroughs, while I have given a lot of explanation of why there are no rendering breakthroughs and the actual rendering, no matter where it is done and no matter how much data is used, is trivial. I'm not sure what can help you focus in on the point I'm making here, I haven't strayed from it. This isn't about the workflow or the language used or anything else. It is about false claims and buzzwords to make people think that it is solving rendering problems that have never existed like 'accuracy' and 'big data' ( in the context of these visualizations ).


They are touting it specifically in the context of the visualization of very large datasets.

The fact that their software exists is itself a breakthrough. It enabled me to do things that other equivalent tools (such as in statistical packages) could not allow. I would have been reduced to directly implementing my rendering pipelines, and I would also have had to make many of the same design decisions they made, such as doing things out of core.


I do not know the data shader work well enough to defend it, nor to even know if it deserves defense, but I can at least respond to your argument.

You imply that accumulative rendering into a framebuffer solves large statistical integration problems. But, the framebuffer is not implemented using abstract math over the real nor integer domain. You need to consider numerical effects of adding the smallest value (one sample) into a running sum.

If you use an integer/fixed precision buffer for the running sums, you need enough bits to avoid overflow even if billions of points land in one bin. You might think to use floating point, but that has worse problems for running sums. You are effectively limited to the number of bits in the mantissa when continuously adding small increments.

So, you cannot scale up the naive approach of zeroing the framebuffer and blending/accumulating points from a stream. You need to do some hierarchical aggregation to accurately represent sub-populations and combine them in a numerically robust manner. Most likely, you would also like to precompute some of these results to support better interactive performance, much like mip-mapping is used to provide more accurate texture sampling at multiple rendering scales.


I think you are deliberately trying to misunderstand what is being done in this project.

It's not about what APIs are being used to render whatever. At that level of analysis, all that anybody is every doing, is just doing memcpy and bitblt. Rather, datashader provides a framework for applying semantically meaningful, mathematical transformations on datasets as they're being accumulated, as those accumulations are converted into aesthetic/geom primitives, and as those primitives are rendered into colors. It really is "renderman for data", along with arbitrary vertex/texture shaders, driven by a dynamic rasterizer that can use whatever bins in data-space (not merely physical pixels).

BTW "Out of core" does NOT come from raytracing; in fact its history in computing is a term for anything that exceeds physical memory. We use it all the time in scientific/HPC and data science because datasets are frequently much larger than available memory.

https://en.wikipedia.org/wiki/External_memory_algorithm


I think you are misrepresenting what is being done in this project. People seem to like it. They say it has workflow refinements. That's great, but there isn't anything new being done in rendering here unless doing something trivial in a pointlessly complex way and renaming fundamental techniques counts as a breakthrough.

Focus on the workflow refinements, saying g there are rendering breakthroughs here is snake oil.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: