Hacker News new | past | comments | ask | show | jobs | submit login

I am very interested into your thoughts about LateX, why we need another "standard" if we have LateX? Do you provide something in addition which is not possible with LateX?

Thanks




I've been working on a LaTeX-based resume builder (link in profile) and I could see something like this being useful for allowing users to take their data with them and move it between services. Anything that reduces the time it takes for people to input information and see the resulting image/pdf is worth looking into, so I'll be checking this out.


but is LaTeX machine readable - can i be given a latex document, and extract the essential data from it and transform it into some other format/structure that i can use elsewhere?


I think we might be conflating 'machine-readable' and 'well-formed/structured' here...

What they're proposing is structure, not machine readability... Technically, with the state of OCR and other methods, many things like LaTeX, PDFs, .DOCs, etc are all perfectly machine readable -- the problem is getting the information out in a meaningful way.... that's usually done by standardizing on structure.

When it comes to specifying structure, XML and JSON are pretty much the front-runners (at least I think so), and since XML isn't so popular lately (since it's so verbose), JSON is in the spotlight now


"Technically, with the state of OCR and other methods, many things like LaTeX, PDFs, .DOCs, etc are all perfectly machine readable"

Uh, no. What does OCR even have to do with the 'machine readability' (for your strange definition of it) of the formats you mention?


Machine readability, unless I and the dictionary are mistaken, is related to whether or not you can input the data from some physical medium into a computer. It's not the same as asking whether something you read in makes sense, or is easy to parse, or is well-structured, which is what their idea is related to.

Other than the fact that most of the things I noted are ALREADY in a state that is computer usable (whether it makes sense to humans looking at computers is another question), I was trying to point out that getting the information into a computer (A.K.A. Machine readability) is not the issue we're struggling with now, it's structuring, characterizing and understanding the information.

I mentioned OCR because it's used to parse resumes to find important bits when other methods fail (or someone submits something crazy like a scanned version of their resume instead of the file) -- something that wouldn't be necessary if there was a standard (like is being proposed) to adhere to, that people could semantically find what they wanted from.

Also, if you read closely, I said with the state of OCR AND OTHER METHODS... OCR does not have anything to do with the machine readability of other methods, it's just one method people use to extract data, or for machines to "read" the document.


Haha +1 to this. By that logic my ass is machine readable if I take a digital photo of it.


It depends on the machine, if you're feeding your picture to a machine with image recognition capabilities, it might just be able to tell that it's your ass. Some pretty big companies are working on just that. (not on figuring out if it's your ass, but image recognition in the general sense)

The big picture is to make as many things as possible "machine readable" (things more difficult than resumes), and use machines to work faster than humans ever could.


To be fair, they're proposing a structure that lends itself to machine readability.


Yes you can




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: