Hacker News new | past | comments | ask | show | jobs | submit login

In other words, you write your own CSV parser, in this case using Java.

Btw, CSV files can contain values with commas and even newlines inside of them. So if your point was that you don't have to write an _entire_ CSV parser, only a partial one, unfortunately that isn't true. https://en.wikipedia.org/wiki/Comma-separated_values#Example

>I very much doubt that any CSV parser would try to load the file all at once

I agree, but a naive imported solution would likely try to store all the rows at once, possibly in some sort of list or vector or array or whatever. This is what would cause the memory failure, not the file read itself.




Possibly, but unlikely (most csv libraries will be based around iterators). In any case, the problem is stupidly underspecified. For example, what if the 100GB CSV is 100 1GB rows? What if the input is UTF16, UTF32? How do you deal with the 10 Unicode line separators?

It just tests how well the interviewee knows CSV, which is an ill specified format anyway. It's a fake problem as no sane person would parse 100GB CSVs on 500MB VPS', and in real life, you'd just try the naive solution, see why it didn't work and iterate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: