
Contributing os.scandir() to Python - benhoyt
http://benhoyt.com/writings/scandir/
======
raymondh
Thanks for the write-up. It does a good job of communicating:

* the joys of contributing to Python

* how time and labor intensive the process can be

* the complexities of dealing with multiple operating systems

* what it is like to be alternately helped or hindered by other developers

------
int_19h
I think that sort of thing is an often-underappreciated benefit of Python as
an ecosystem - it has a well-defined procedure for adding and changing things
to core language and library, that strikes a pretty good balance between
agility and prudence, and generally yields great results.

------
danso
That was a great write-up. I liked how you not only covered the technical
details, but the human details, too -- how to get your idea noticed by python-
dev, how to gain early adoption.

------
phillberson
I think the additional windows attributes initiated here:

[https://github.com/benhoyt/scandir/issues/22](https://github.com/benhoyt/scandir/issues/22)

(Source: It was my issue :) )

I was long using the project since it was named betterwalk and was very glad
to see PEP 471 approved. Great work Ben.

(Now I just need to upgrade the project I use it in to 3.5)

~~~
benhoyt
Aha, I'd completely forgotten that had come via a scandir issue, sorry! Fixed
here:
[https://github.com/benhoyt/benhoyt.github.com/commit/0c156b9...](https://github.com/benhoyt/benhoyt.github.com/commit/0c156b9b16d918eca09e3eb6e38e78ead992a2fc)
(let me know your name if you want to be called out by name).

~~~
phillberson
Ha, nah... that wasn't the idea of mentioning it. Probably shouldn't have...

Thanks again for all your efforts.

------
rajathagasthya
Great work! To contribute to Python, would you say it's necessary to be
comfortable with writing C code?

~~~
benhoyt
No, not necessarily at all -- there's a ton of pure Python code in the
standard library, so if you're adding to or making improvements there, no need
to know C. And then there's documentation and other non-code issues too.

~~~
rajathagasthya
Perfect, thanks. Been wanting to contribute to Python upstream for a while
now, but it's overwhelming every time I take a look. :)

~~~
xapata
Remember that status quo is the sane default. Much more important to fix bugs
and add docs than to add features.

------
AdamJacobMuller
Very interesting. I just benchmarked this (the python2.7 module _only_ ) with
an internal application that walks over filesystems and found scandir.walk()
to, inexplicably, be slightly slower than the os.walk().

I think part of the issue (though I've not tested this yet) is that we're
stat()-ing every single file _anyway_ so with os cache considered, it really
ends up not mattering anyway.

I thought the additional cost of the extra system calls (even if they were
entirely cached in memory) would add up, but, it seems like _something_ the
scandir module is doing is just less efficient in general.

Devising some much simpler and more controllable tests (but still with our
exact workload) and testing more though.

~~~
benhoyt
That's very intriguing to me -- I've rarely/never seen it be _slower_ , and
usually at least several times faster. Are you using benchmark.py? If so, can
you send a link to a Gist with the output? It may be that the C module is not
compiling, and it's falling back to the much slower pure Python module.

~~~
AdamJacobMuller
I wasn't running your specific benchmark test. I've been running some more
controlled tests now that show it basically being on-par with listdir() which
I still consider completely oddball.

Also, not remotely hating on what you did here. I actually wrote/tested a very
similar concept in python several years ago for the exact same reason. My
C-skills (and mostly time resources) weren't really up to par to try to get it
included in core. I mostly _really_ want this to work as well as I think it
should :)

I'll run some tests with your benchmark.py (and compare that to my benchmark
script) and post some results.

------
billiob
This issue has been looked into decades ago.

NFSv3 introduced READDIRPLUS in 1995:
[https://tools.ietf.org/html/rfc1813#section-3.3.17](https://tools.ietf.org/html/rfc1813#section-3.3.17)

FUSEv3 (the library) will have support for such a call. It is already in the
git repository (
[https://github.com/libfuse/libfuse/blob/master/include/fuse_...](https://github.com/libfuse/libfuse/blob/master/include/fuse_lowlevel.h#L1030)
) .

Sadly, there is still no syscall on linux to do a "readdirplus" (usually
called xgetdents()).

------
vectorEQ
Nice to read this man thanks! Interesting to see your experience in this
process and what it's like to contribute to such a project. I've stopped
writing alot of python code admitedly because i prefer to torture myself with
unnecesarily tedious things... but i like how python evolves, and this kind of
contributions are really important and what makes me feel python is one of the
more powerfull scripting / programming tools if you're not writing in machine
native codes. thanks for the writeup, and ofcourse your contribution to
improving all of our codez =]

------
cmdrfred
Nice work, I'll keep it in mind next time.

