I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?
> I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?
Replace the `file_paths` list in my proof of concept with your large file(s), delete the rest (lines 61-68, 77-79) and it should just work.
Works fine with Python's standard library. Files in a ZipFile can be read in a streaming manner. There is no need to store all the data in memory.
import io, csv, zipfile
max_lines = 10
with zipfile.ZipFile("data.zip") as z:
for info in z.infolist():
with z.open(info.filename) as f:
reader = csv.reader(io.TextIOWrapper(f))
for i_line, line in enumerate(reader):
if i_line >= max_lines: break
print(line)
Nice explanation, thank you for sharing.
Do you have any experience working with mobile apps? I'm wondering if any aspects of the design process you outlined differ when working with mobile apps?
Heh, I went to grad school with him. Haven't heard what he's been up to in over a decade.
I was solidly in the "C++" column back then, but have since become a data scientist who now uses numpy/Python for all machine learning. That talk was a very interesting, helped me to understand what they're doing in my old field these days. Thanks for sharing.
I need to open a very large CSV file in Python, which is around 25GB in .zip format. Any idea how to do this in a streaming way, i.e. stopping after reading the first few thousand rows?