
Show HN: ConvTools – generates Python code of conversions, aggregations, joins - westandskif
https://github.com/itechart-almakov/convtools
======
MJSplot_author
Just my initial response.

Maybe fine a better first example? It is quite code dense:

    
    
        conv = c.aggregate({
           "a": c.reduce(c.ReduceFuncs.Array, c.item("a")),
           "a_sum": c.reduce(c.ReduceFuncs.Sum, c.item("a")),
           "b": c.reduce(c.ReduceFuncs.ArrayDistinct, c.item("b")),
        }).gen_converter()
        conv(input_data)
    

when compared a trivial native python equivalent:

    
    
        conv = lambda data:{ 'a': [el['a'] for el in data ],
                        'a_sum' : sum( [el['a'] for el in data ]),
                        'b': list(set( [el['b'] for el in data ])), }
        conv(input_data) 
    

which appears to have the same functionality. This is quite off putting and it
took me a while to dig down to find why convtools can offer more than just an
extra abstraction layer to learn. Perhaps pick an example that shows off the
non trivial functions like joins or GroupBy?

~~~
westandskif
Just to add my previous answer: the trivial native python equivalent doesn't
have the same functionality, because it consumes data iterator 3 times in your
case, while convtools would consume it only once.

------
uryga
looks really cool! congrats, seems like a lot of work went into this, and code
generation is always fun.

i do have some feedback about the readme though...

maybe i'm not the intended audience, but the examples didn't work great for me
- the readme just shows the code without the inputs/outputs, so you kind of
have to guess what it does. ("show me your tables, not your flowcharts" etc.)

i also think you should add some more basic examples / common tasks, e.g.
converting AOS to SOA:

    
    
      [{'a': 5,  'b': 'foo'},
       {'a': 10, 'b': 'bar'} ]
        
      c.fun['stuff'](data) # look how concise!
      
      {'a': [5,     10   ],
       'b': ['foo', 'bar']}
    

and build up to more complex stuff from there, to help readers get a feel for
the library.

and i think the examples should be a bit higher up in the readme – ofc wanting
to describe how cool the implementation is is natural :) but honestly, when
i'm looking at a library like this, i want to be able to make a quick
assessment if might be useful for me - the implementation is kind of secondary
in most cases.

now, despite what i wrote above, i'd love to hear some stuff about the
implementation :) as someone who also wrote a library that does runtime python
codegen, what's your approach to that?

~~~
westandskif
thank you very much for the feedback! this is very valuable! :)

Regarding the README, I'll improve it within the next few days.

As for the approach, the main assumption was that everything is simple as long
as you deal with expressions only, so I've introduced every expression I
needed as a conversion object (each able to generate the code within the
context).

Exceptions are custom code generating parts (e.g. aggregate, reducers) and the
part where I break down piped conversions into a series of statements in the
top level converter.

Another tricky piece was to support parametrization - e.g. c.input_arg here -
[https://convtools.readthedocs.io/en/latest/cheatsheet.html#c...](https://convtools.readthedocs.io/en/latest/cheatsheet.html#converting-
using-hardcoded-maps-filters) So it was necessary to make every conversion
know about every inner dependency it has, to make all dependencies pop up, to
know function signatures & parameters needed to be passed during internal
generation of functions.

~~~
uryga
sounds interesting, i'll have a look at the code when i have time. i've mostly
done compiler stuff like this with the "one function with a huge switch on the
expression type" approach, curious to see what the more OOP-ish way looks
like.

btw: wow, that cheatsheet is exactly what i had in mind on my first comment,
that's the kind of stuff i'd like in a readme! maybe a few excerpts with a
link to the whole thing.

some more remarks if you're interested:

\---

in that cheatsheet it'd be cool to also show the generated code for each
example, maybe in a collapsible box or sth – in that context the actual
semantics of a convtools expression are useful to know.

\---

have you thought about some magic syntactic sugar? the current approach is
kind of visually heavy, since you're basically writing an AST by hand. with
some __dunder__ hacking you could easily (?) add a "magic" api like

    
    
      from convtools.magic import magic as m
      
      c.item('key') ->
      m['key']
      
      c.call_method('foo', ...) ->
      m.meth.foo(...)
      
      c.call_function('bar', ...)  ->
      m.func.bar(...)
    

or something similar. it might be a bit too magical for some tastes, but if
you're constructing a python expression, it kind of makes sense to use python
syntax for that

~~~
westandskif
oh, thanks, I'll add links to cheatsheet and quickstart pages to the README,
it really makes sense!

As for the magic-stuff, I was contemplating designing the API with this
approach, but I changed my mind because it would be difficult to tell which
python expressions are evaluated at the moment of a conversion definition AND
which in the compiled code.

However if we imagine this "magic" API, then it could be even closer to normal
python code:

    
    
      m["key"].some_method(...)
    

which would resolve everything under the hood.

===

as for the collapsable generated code examples -- I've jotted down :)

~~~
uryga
> it would be difficult to tell which python expressions are evaluated at the
> moment of a conversion definition AND which in the compiled code

this is ofc a valid concern in any metaprogramming situation. has this been a
problem in your experience? i'm guessing e.g. generating a conversion based on
a list of fields is a thing someone might do, but it feels like a minority
usecase (at least to me, someone with no actual experience with using the
library :p)

~~~
westandskif
Re: whether it's been a problem in my experience -- sort of - yes

So now I'm doing my best to observe: PEP-20 the 2nd commandment with the hope
that I'm not violating the 1st commandment badly :)
[https://www.python.org/dev/peps/pep-0020/](https://www.python.org/dev/peps/pep-0020/)

Also I see another upside of this no-magic syntax in that it is distinctive --
there's no way to mix up convtools-related code with any other python code.

------
gwenzek
Interesting approach. I'm currently not satisfied by Pandas which seems to be
the defacto tool for processing tables. But I find the query API really
unnatural especially for filtering.

Do you have some benchmark for performances ? Is this more aimed at playing
around in a notebook or used inside a full data processing pipeline?

~~~
duckmysick
What doesn't satisfy you in Pandas?

~~~
akdor1154
not op, but the API does not feel coherently designed, with the same sort of
complete-but-hard-to-learn vibe as php's standard library.

there are no mypy module stubs so ide autocomplete generally just doesn't work
(and likely never will properly as the API is often inconsistent in its return
types based on what you pass to it)

The docs are detailed but most is the meat is in great long module-level
manual pages, which are difficult to use as a quick reference. basically I
have been using pandas for about a year now and I still hit around one multi-
hour long 'how do I do this seemingly basic operation' dive into stack
overflow/GitHub/etc per week.

pandas code itself is very difficult to understand, due to being based around
weird python metaprogramming mixin patterns and needing to do a fair amount of
optimised stuff in cython anyway.

with that said I still have been using Pandas for a year and it lets me do my
job, so hey it's not all bad. designing a general purpose api like this
correctly the first time is probably impossible, and I'm really grateful for
the work the pandas devs have achieved.

------
carapace
Pretty awesome.

What about errors at conversion time? Is there any help for that or do you
just get the raw traceback of the generated code?

~~~
westandskif
On exception it populates linecache so that tracebacks are normal and you can
debug it normally with pdb post-mortem debugging -
[https://docs.python.org/3/library/pdb.html#pdb.post_mortem](https://docs.python.org/3/library/pdb.html#pdb.post_mortem)

Nothing else at the moment, but I've written down this point to contemplate in
the nearest future -- thank you!

~~~
carapace
Ah, that's cool. Cheers!

------
birdculture
Looks neat! Would you ever consider adding a mode that skips code generation?

~~~
westandskif
It's not possible to skip the code generation part because a resulting
converter is always compiled from the code written under the hood. Could you
please share what is your concern about this? I'd really like to better
understand it!

JFYI: it's possible to skip running "gen_converter" method, it's possible to
just use "execute" \-- it runs "gen_converter" under the hood: c.group_by(
c.item(0) ).aggregate({ c.item(0): c.reduce(c.ReduceFuncs.Sum, c.item(1))
}).execute([ (0, 1), (0, 2), (0, 3), (1, 10), (1, 12), ])

    
    
      Out[5]: [{0: 6}, {1: 22}]
    

The downside is that you won't be reusing the converter.

------
pyuser583
Looks promising!

