Hacker News new | past | comments | ask | show | jobs | submit login

Labeling of observations better than a list of column label strings at the top would make it possible to mine for insights in or produce a universal theory that covers what has been observed instead of the presumed limits of theory.

CSVW is CSV on the Web as Linked Data.

With 7 metadata header rows at the top, a CSV could be converted to CSVW; with URIs for units like metre or meter or feet.

If a ScholarlyArticle publisher does not indicate that a given CSV or better :Dataset that is :partOf an article is a :premiseTo the presented argument, a human grad student or an LLM needs to identify the links or textual citations to the dataset CSV(s).

Easy: Identify all of the pandas.read_csv() calls in a notebook,

Expensive: Find the citation in a PDF, search for the text in "quotation marks" and try and guess which search result contains the dataset premise to an article;

Or, identify each premise in the article, pull the primary datasets, and run an unbiased automl report to identify linear and nonlinear variance relations and test the data dredged causal chart before or after manually reading an abstract.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: