
How Python Makes Working with Data More Difficult in the Long Run - BerislavLopac
https://medium.com/@jeffknupp/how-python-makes-working-with-data-more-difficult-in-the-long-run-8da7c8e083fe
======
andybak
Leaving aside the static/dynamic debate - there's no attempt here to suggest
that the issues over readability and maintainability could be mitigated. He's
just shown the simplest Python implementation, waved his hands and then said
"Hey! Let's use Go".

I'm sure there's plenty of reasons to use Go but I'm not sure this article
makes a terribly good case for it in this scenario.

How about using

------
aikah
While static typing surely helps, Go has its own issues working with
polymorphic JSON, and don't get me started on working with XML. I have never
liked struct tags. Why would a statically typed language relies on this
fragile feature ? it's often a source of bugs since they are just strings that
need to be parsed at runtime, with reflection, and frankly the std lib struct
tag parser is extremely fragile. Go is full of these things. In order to make
the compiler go fast, the developer has to do the compiler's job, at runtime.
That's not right.

If some external data doesn't fit Go type system then Go makes it extremely
hard to work with these data. Same thing with SQL, it's extremely tedious with
Go when fields can be null.

------
StrykerKKD
If you want better type information for Python, you could use the new typing
hints and the typing
module([https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)).

You could also use
mypy([https://mypy.readthedocs.io/en/latest/index.html](https://mypy.readthedocs.io/en/latest/index.html))
which is a static type checker for Python.

------
staticassertion
This was the subject of the talk that I gave with a cospeaker at RustConf.
Effectively, the shortcomings of dynamic type systems when it comes to
building data science oriented services.

You mention that it's much omre laborious to write a struct that matches your
data exactly - however, I've found that, at this stage in the process you
already have some sort of schema (mental or formal) of what your data is, and
it's fairly trivial to write out a structured representation of it.

What this then buys you is a semantic understanding, in your code, of what
'bad data' is and how it's handled. You know that if you get a string where
you expected an int, that's a problem. You also know that if data is missing,
eg: null, that you'll handle it.

In Python, a function can erroneously return None. To your data, this is just
a missing value, not a bug. But there is a semantic difference, and a cost
associated with that bug.

All of these problems, limited strictly to working with data, are far easier
to react to and deal with in a static language.

We absolutely conceded the exploratory phase to Python. It is, and will likely
remain, an excellent and near-ideal tool for that phase.

~~~
andybak
> In Python, a function can erroneously return None.

If it hurts stop doing it. I can think of several ways to solve this
ambiguity. Off the top of my head:

1\. Create a custom class to represent your data's 'None' and treat the real
'None' as an error.

2\. Stick to a fairly reasonable convention that all functions and methods are
terminated with a single return statement. This would make it pretty hard to
get bitten by the implicit None (which has never actually bitten me in the
real world)

~~~
staticassertion
Absolutely, and we did this in fact in some of our code. But at that point
you're solving the problem with types, which is my point. And it involves a
pattern, not something that's enforced across a codebase.

~~~
johnobrien1010
It seems like the argument is because the language doesn't force you to use
static types, in the long run it could be bad.

But since it provides you the option to use static types, I think the
limitation is more in whether you do so or not, not in the language. If
anything, by not forcing you to use static types, the language provides more
options, which seems like a good thing...

~~~
staticassertion
> It seems like the argument is because the language doesn't force you to use
> static types, in the long run it could be bad.

Not so much 'in the long run' but moreso 'as soon as you end the exploratory
phase'.

> I think the limitation is more in whether you do so or not, not in the
> language

Perhaps if the type systems were equal in power, but they are not.

> If anything, by not forcing you to use static types, the language provides
> more options, which seems like a good thing...

The problem here is that touching an untyped interface can break inference,
and break assumptions about types, meaning that by having the type system be
optional you potentially poke holes in it.

But to be clear, I think that the typed python approach is valid, and can be a
great way to help with these problems. I am not saying to ditch python and go
with another typed language - just saying that the untyped approach has
caveats. Maybe you're right and optional systems are better, but I think the
point to get across is that a typed understanding of your data helps a lot.

------
mmirate
> In Go, for example, to parse and return a JSON response from some web API,
> you first need to create a struct whose fields and field-types exactly match
> the structure of the response.

Right, but why can't these declarations be written by some Python code? (with
a large set of examples of responses as input)

------
willtim
Can we please see a comparison with a statically-typed language better suited
to data-science? Go is a long way away from the current state-of-the-art in
statically typed languages. I would hope that F# with its type providers has a
pretty good exploration story and possibly Scala too.

~~~
doug1001
completely agree. Never occur to me to use Golang for data-focused
applications

among the most important criteria for me are:

parametric polymorphism (eg, a typed vector); immutable data structures; and
exception handling (preferably monadic)

and i don't believe Golang has any of these.

As you mentioned, if one is looking for a statically typed data science
language, objectively scala is hard to miss. Apache Spark is probably the most
widely used "big data" tool at the moment; it's written in Scala, and while
the python API is likely far more widely used, new features hit the scala API
first.

(i don't know F#, but aside from the feature you mentioned, R copied it's pipe
operator, |>, which i also see in a lot of scala code, usually implemented as
a type class)

------
samuell
Reporting quite a similar experience as Daniel Whitenack spoke about, in his
GopherCon talk about Go for Data Science:

[https://www.youtube.com/watch?v=D5tDubyXLrQ](https://www.youtube.com/watch?v=D5tDubyXLrQ)

See also his answers to some follow-up questions:

[http://www.datadan.io/common-go-for-data-science-
questions/](http://www.datadan.io/common-go-for-data-science-questions/)

... and his O'Reilly article on the topic:

[https://www.oreilly.com/ideas/data-science-
gophers](https://www.oreilly.com/ideas/data-science-gophers)

~~~
willtim
How could Go possibly be anything but a inappropriate choice? For numerics and
working with data, Go would be about as expressive as Fortran or C, but
slower. I took a quick look at Gonum, it's highly imperative and monomorphic
just like Fortran. It makes me sad that these efforts could be spent improving
the ecosystems of Haskell/OCaml/F#/Scala. These languages are far more
expressive than Go and offer far more type-safety, all with the succinctness
of Python

------
rini17
We would not have this conversation..if Excel supported big data :D :D :D

------
kylebenzle
Use R

~~~
c06n
Serious question: Exactly how would R help here? Do you have an example
perhaps?

