

Please code review whoarder, my first (tiny) Python module - ronj

Hi, I'm an Industrial Engineer-turning-developer. Over the last years my role was half-functional/half-development, but I've heard I'm able at technical stuff (and I LOVE IT) so I want to dig deeper that direction. So far I worked on VB, web dev, and C#/Java enterpriseware.<p>I've been playing with Python for a while, which I love and would like to target professionally, and am lucky enough to take a work break to train myself and hope to land a Python job. I read a few books (Clean Code, Dive Into Python 2+3) and the last two days I built https://github.com/ronjouch/whoarder<p>Being on my own I'm probably doing some things horribly wrong, so I'd like some feedback from experienced people. Anything will do! Advice on unit tests, general coding practices, python-specific features I use well or misuse, advice on books, tools, etc. I have a few questions, too:<p>1. test.py: I used unittest to get used to standard stuff. What would I gain by using nose? Do you always use it, or is unittest fine for small projects?<p>2. whoarder.py: The if block in is ugly but is the only workaround I found for relative imports in py3. Could I restructure my app to avoid the issue?<p>3.1. My app could fit in a single .py, but I made a module anyway, because I'm happy to drink the "It will help you structure and better SRP your code" kool-aid. Right?<p>3.2. ... but I put several classes in a single .py because I felt it worthwhile to bring them closer. Is that ok or frowned upon?<p>4. clippings.py:ClippingsIterator: I abstracted input parsing through a custom iterator, while I could just have split by separator. That way I don't read all upfront and can interrupt if something goes wrong. Good or overkill?<p>5. clippings.py:detect_encoding: any idea why chardet2 doesn't tell me about the BOM?<p>6. template.html/web: anything worth mentioning; I'm very much doing things 'by hand', maybe the wrong way and would appreciate general feedback.<p>Thanks for your help.
======
dalke
1) I use unittest since I ship code to others and I want fewer dependencies. I
think if I was doing this only for myself I might look to nose. But I've been
using unittest long enough that I know how to get what I want from it, so I
haven't bothered to learn nose.

2) I don't know about Py3 issues like that.

3.1) I am not much of a believer about "SRP". What's your public/published
API? Is it useful? It is understandable? Does your exposed module structure
give a domain view or an internal organizational view that might change after
refactoring. Behind the scenes you're more free to design as you wish.

3.2) In a small package like this, I would have everything in a single file.

4) I don't like the "import re" and other class attributes in
ClippingsIterator. I prefer them as global variables, and prefix with a "_" as
a hint that they aren't meant as part of the API. I believe this preference is
the standard consensus.

5) don't know.

6) I'm not knowledgeable enough to judge, but it looks fine to me.

Minor points:

    
    
        for c in clippings:
            self.clippings.append(c)
    

is more idiomatically written

    
    
        self.clippings.extend(c)
    

"(clipping['book'],clipping['author'])" should have a space after the comma.

In general it's best to have things like "import os" done once, at the top of
the module. "os" is a builtin, is always present, and the "import os" has
almost no overhead.

The jinja2 import is a bit more special. Do you want the code to work even if
jinja2 isn't installed? In which case, what you have is good. I would add a
comment, on the same line as "from jinja2 import ...", saying "available from
<url>". This gives an additional clue should a failure occur.

But if your code doesn't make sense without jinja2 then I would put the import
at the top of the module, to occur at import time. This is more like what
people expect, so would be easier to diagnose should a failure occur.

It doesn't look like "import_clippings" is part of the exposed API, since it
takes no parameters and always sets an instance variable that looks like it
should only be set once. In that case, prefix it with a "_" to indicate that
it shouldn't be called directly.

I would write ClippingsIterator() as a function rather than a class, in this
rough form:

    
    
        def read_clippings(source):
            detected_encoding = _detect_encoding(source)
            with open(source, mode='r', encoding=detected_encoding) as source_file:
                while 1:
                    clipping_buffer = []
                    count = 1
                    while True:
                        if count > 5:
                            raise InvalidFormatException("Input file doesn't seem to be a clippings file, separators are missing or damaged")
                        line = self.source_file.readline()
    
                        if not line:
                            return   # end of input
                        elif line != self.clipping_separator:
             ...
                yield line_dict
    

where "_detect_encoding" is your current method, but written as a module
function. (One hint that a method isn't actually method is if it never uses
the "self" variable. In those cases, I nearly always write them as module
functions.)

~~~
ronj
Great, and cool that you touched imports, that's also something I'm hesitating
on. I'll be doing your suggested refactors, and am considering consolidating
into a single .py. Thanks!

