Hacker News new | past | comments | ask | show | jobs | submit login

Can't really agree, literature is literature because of form not content, every literary work is its own separate world with references to other similarly separate worlds from the past. Data analysis can help finding invariants, like in Meter in Poetry from Prof Nigel Fabb https://www.amazon.co.uk/Meter-Poetry-Theory-Nigel-Fabb/dp/0... , but not the reason why literature is literary at all, which comes from social sciences. Mythopoesis as the cumulative sum of the real and imaginary worlds thought by humans is another possible result but it would not be literature, languages and theory of language instead.

> every literary work is its own separate world with references to other similarly separate worlds from the past.

This sounds like the basis of hypertext fiction (https://en.wikipedia.org/wiki/Hypertext_fiction) which has of course existed for much longer than big-data as a concept.

As for what characterizes literature, I'm inclined to agree with you. However, is there not inherent value in another (albeit computational in nature) reading, so to speak? If we take Barthes to be correct, what does it say when a well-trained NLP model draws similar conclusions to humans with regard to imagery, analogy, irony, metaphor, etc. when reading major literary works? Different conclusions? Or no conclusions? What if some works "compute" and some don't? What if some works's features are culture-independent, i.e., a model trained on an Eastern literary corpus computes similar features as a model trained on a Western literary corpus, while some features aren't?

Perhaps these questions are more superficial than I'm making them out to be, but it seems presumptuous to assume that methods that look at this problem from this angle won't get at _any_ literary features.

You're right, of course. My point is literary grade is not an invariant, it changes with time and culture and the historical events of a given community. There are so many cases of artists being considered shit one period, good another century, then shit again, indifferent or even canonic for a while. What exactly should data science train for a universal literary / not literary classifier then? Or unsupervised learning for clustering what? HN is full of bright minds, so I'm sure a formula might be reasonably suggested, but that's not the point of literature as a form of its own, like maths or music.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact