
Datapy - JacksonWoo
https://github.com/JacksonWuxs/datapy
======
petters
Strange to see this on the front page.

I mean, it was probably a good learning experience, but a CSV parser written
in pure Python is not going to be faster than numpy. There must be something
wrong with that benchmark.

I don't see how this would be useful to anyone except the author. And that is
completely fine! I am just pointing this out since it was on the front page.

------
hatsunearu
Good stuff, though not really sure why you'd use this instead of just parsing
with `csv` into a list or whatever. If this is a learning project, let me make
some suggestions:

in `readtable`, not entirely sure what `col` needs to be, but if I'm reading
this correctly, it's like a list of indices. I'd try using the generator
pattern instead of doing `for j in col` and `every[j]`.

~~~
noobermin
Not sure if there is a general place to learn this stuff (like a specific
tutorial) but when hacking in python, it helps to think not in terms of c like
loops in which you iterate over a collection using a counter like `for(int
i=start; i < end; ++i) {...`, instead, for loops in python are specifically
for iterating over collections.

    
    
       >>> a=[1,2,3,4]
    

Instead of

    
    
       >>> for i in range(len(a)):
       ...     print("->{}".format(a[i]))
    

You can (should!) just do

    
    
       >>> for x in a:
       ...     print("->{}".format(x))
    

The entire point of for loops in python is it is for iterating over
collections. Even better is that very same syntax can be use if `a` is _not_ a
list that can be indexed, like a generator, for example.

------
JacksonWoo
DataPy is a light data processing library which could be used in loading data
from data profiles easily and contain or reduce part of the data easily.
Additionly, it contains some basic formulas that help the developers to
understand the basic variables about the data set. The first Stable version
(V1.2.3) will be uploaded to pypi by March 15.

------
aldanor
\- It's Python2-only

\- The CSV parser is slow and not RFC-compliant (e.g. quotes)

\- The parser is 10x slower than pandas

\- Iteration over rows is same speed as pandas

Proof:
[https://gist.github.com/aldanor/ecae5caca6dbf686ddb22e6e2730...](https://gist.github.com/aldanor/ecae5caca6dbf686ddb22e6e27305822)

Probably a good learning exercise for OP, but strange to see this on HN front
page...

