echo "jQuery('#header-container').hide();" >> ~/.jupyter/custom/custom.js
FWIW, I thought the comment was pretty funny, and the first answer was even better.
The two major uses I have are prototyping nearly any coding project that requires python and teaching myself data analysis. This has saved me hours if not days due to the fact that the feedback loop is so fast. When I code in other languages I desire the iPython interface.
It's not specifically IPython, but Jupyter does support many other languages through kernels!
I've been doing a bit of python the last few weeks for some image processing/computer vision tasks (using opencv and numpy).
I have to say, all together it's a pretty miserable developer experience.
Python is incredibly slow - forcing pretty much all computation no matter how trivial into contortions via numpy incantations so that the inner loops can run in native extensions -- these Incantations have a lot of implicit not well documented magic. Miss some details in the behavior of the magic and suddenly you have a major 10x slowdown -- but good luck finding where. I would kill for an easy to use tool like xcodes Time Profiler ...
API usage errors (even those where invariants are checked at runtime) are ridiculously under informative -- opencv for example does quite a bit of runtime sanity checking on the shape and type of arguments to its methods -- but somehow even simple details as to which parameter is the cause of the error don't get reported in the stack trace severely increasing the cognitive load required to identify the mismatch -- not fun when multiple arguments to an API are the result of a chain of numpy magic data munging. This may be an opencv complaint more than python (aside: opencv is pretty terrible.)
I'm not sure what I'm doing wrong with python but I find the majority of my code to be sort of menial data munging -- and I haven't figured out good patterns to organize this munging in any sensical way --with a static language d.r.y patterns to centralize such plumbing operations have the awesome effect of moving invariants into reasonable places -- in python without any ability to organjze guarantees, as the code base evolves I find myself needing to repeatedly check data shapes/types -- there doesn't seem to be an obviously useful way to organize verification of data types as the necessary invariants become apparent. These issues are compounded by the fact that refactoring is an enormous pain in the ass!
I feel like all my python code is throwaway code. Maybe that's what I'm missing -- I need to just accept that all the python numeric code I write is pure one-off junk, embrace copy paste and never try to reuse any of it ...
Sorry for the rant! I remember loving dynamic languages when I first discovered them - but right now, I really miss c++ (or even better swift).
I can't imagine the number of hours wasted because of these overly dynamic tools -- and there is simply no decreasing that lost time in the future -- as these languages grow if the house of cards ecosystems they sit atop grow and motivates more use then ever more developer hours will be lost to avoidable triviality ...
forcing pretty much all computation no matter how trivial
into contortions via numpy incantations so that the inner
loops can run in native extensions
Welcome to the development of Matlab-style code. It can be pretty disheartening to see a complicated hand-crafted algorithm replaced by, say, a matrix multiplication. Try to find Matlab/numpy implementations from paper authors. This programming style is a bit difficult to get used to initially. But it will become incredibly powerful as you start thinking in algebraic operations.
Also, don't do debug runs on the full data. Implement ideas with a subset that computes fast. Then simplify the code and make it fast. Then run it on the full dataset.
If you absolutely need to combine linear algebra operations with fast imperative code, have a go at the Julia language  or use a C++ library like Eigen  or Armadillo .
In a practical sense, the libraries that make up the Python scientific stack -- numpy, pandas, matplolib, sklearn, etc. -- are all akin to domain-specific languages, each with its own magic behavior, quirks, and recommended best practices that one must master. It takes a long while to become proficient in all of them. The good news is that there's a large, growing community of helpful people using these tools, so you can usually find the answer to any question with a single Google search.
For numpy specifically, if you are interested in getting the most out of it with the fewest possible headaches, I would recommend the following online book: http://www.labri.fr/perso/nrougier/from-python-to-numpy/
Otherwise, please make sure to read bmarkovic's spot-on comment in this thread: https://news.ycombinator.com/item?id=14158597
From a stack-overflow (https://stackoverflow.com/questions/13432800/does-performanc...)... "I remember I read somewhere that performance penalty is <1%, don't remember where. A rough estimate with some basic functions in OpenCV shows a worst-case penalty of <4%".
I'm working on a project that downloads a 2 hour video, frame by frame. Then after, I run image detection on a photo or multiple photos against every single one of those frames.
My code was slow as shit at first. I decided to profile and realized the only speed hog was SIFT's detectAndCompute(). It was taking ~.5s for each frame passed to the algorithm.
So I ended up trading memory for speed and now create huge PyTables loaded with every single frame's KeyPoints and Descriptors. I do this when I first download the videos frames and even though that takes a while, I can now run image detection against 6000 frames (100 minutes) with however many photos in about (24 minutes).
Point is, even just using the built-in cProfile helped a lot because there was only one function that was truly affecting my python. Other than that, I love the python translation. I'm surprised you have memory problems though. OpenCV always murdered my CPU's but never was really memory intensive
The reason why python is the right tool for this job is because it has such libraries for just about everything, from serial communications to a bunch of hardware driving stepper motors, relays and reading inputs to imaging and number crunching and all the other bits that went into this. I'd have a much harder time achieving the same effect in any other language and likely it would have cost me a lot more time.
Python is not perfect (far from it) but it gets the job done.
I appreciate this comment as well -- and indeed I will agree that my expectations are part of the problem.
I'm not actually doing 'performance critical' work -- but rather prototyping tasks -- the speed of my feedback cycle is the only thing that's performance sensitive within the context of my rant -- and this includes time to run the code and the time to debug it.
I should also acknowledge other (self-induced) problems including the need to run code on a remote server for extra hardware capability vs my pitiful laptop.
I should acknowledge that part of my frustration could come from my development process as from the inherent nature of the underlying tools themselves ... I've experimented with my process -- trying to find something that works for me and I've ended up with this hobbled kludge of tools:
- a script that re-runs rsync on file changes
- atom + hydrogen extension connected via ssh to a jupyter kernel_gateway
- some ssh terminals where I run longer running code from command line
- an sshfs mount of a remote directory on the server for viewing some output artifacts ...
Avoidable runtime errors after longish-running processing tasks are a very frustrating time-sync ...
1. If you are prototyping and prefer working on a remote server, have a look at(https://notebooks.azure.com/), memory is limited to 4Gb but its free and jupyter notebooks are great for prototyping.
2. For prototyping you should try and do REPL driven development, what I mean is that you should be ok with just playing around with the library API before you write a longish-running processing task, that would reduce(not remove) the chances of a runtime error. Jupyter notebooks excel in this too as you can just try out code in a cell, learn from it, rinse and repeat. You can also use your IDE to send code fragments to REPL and get immediate feedback on them. This way of iterative development would make sure that the speed of your feedback cycle isn't slow. If python's feedback cycle seems slower to you than C++, you are definitely not using the REPL enough :)
3. If debugging is a pain point definitely give an IDE a try, I prefer Visual Studio as I am on Windows. You can very well go with Pycharm or vscode(not technically an IDE but has a debugger so that's that). I personally prefer Jupyter notebook or Emacs with my code and REPL in split windows for prototyping, different strokes for different folks.
I personally love working with python, I even write my blog posts in a Jupyter Notebook so although it isn't perfect, it doesn't have to be frustrating either :)
Hope my suggestions can be of some help!
Python frequently saunters into the second territory. Well, I guess there are tools available to profile memory usage and stuff, but if I had to spend that much effort tracking down memory issues, I might as well rewrite it in C++. (It doesn't help (or it helps?) that I'm much more comfortable with C++ than Python. YMMV.)
Take this gem from a well known course:
np.concatenate([x.next() for i in range(x.nb)])
I think we need a better abstract language for checking matrix algebra operations.
Option 1: F# or C# with the Deedle library. Deedle provides series and dataframe classes, along with various stats functions. I believe that there are also some vis tools. Type providers in F# allow you to specify a data source, such as a CSV or database, and then not only infer the types but also give you intellisense autocompletion. See the F# guide for data science for more info.
Option 1.5: I hesitate to recommend the following, because there's simply not much here yet, but there is a dataframe library for Nim. Nim is a strongly typed language with a Python-like syntax which compiles to C and is apparently quite fast. It has multiple options for garbage collection but also supports manual memory management. It offers lisp-like macros for implementing DSLs, which the dataframe library I mentioned uses quite a bit. The main problems with Nim are of course the lack of libraries and the need for a notebook-like environment such as Jupyter, which are certainly big problems indeed. But I think that Nim is something to look out for over the next few years.
As much as I like Deedle and F#'s features, I've personally decided to abandon the use of Microsoft technologies due to their many user-unfriendly actions regarding Windows 10 and privacy. I don't fault anyone else for using Deedle, though, because it is a nice tool. This is just a personal decision of mine.
Python IS strongly typed. It is also dynamically typed.
As for myself, the first serious language I taught myself was Python, so I'll always have a soft spot for it. Now that I've begun to realize Python's shortcomings, I've started to work more with statically typed languages; my favorites are Scala and C++, and my next goal is to work with Rust.
Personally, I love Python. It's been working fairly well for our ML projects utilizing scikit-learn to process fairly large spatiotemporal datasets (temporally varying 2D and 3D datasets). I find numpy to be critical for keeping runtimes reasonable.
Regardless of the language, profiling may be necessary in order to obtain acceptable performance.
Also, Python has plenty of abstraction capabilities, just like C++. There's no reason all your code should be throwaway. Make use of classes, magic methods, list comprehensions, hashes, the Pandas library (lots of powerful abstractions for managing data in there).
PyCharm professional edition comes with a pretty nifty profiler.
Now, I would kill for the ability to compare two profiles, but putting them on two different tabs will do for now ...
Python is very popular for the use cases you describe, but I have long felt it's akin to using a Honda Civic that has been souped up to be a specialized race car. You can definitely make a Honda fast around a racetrack, but that's not what they were designed to do. You're better off using a purpose built machine.
That being said, a purpose built racecar is harder to drive, doesn't have AC, needs more maintenance, has more expensive tires that wear out faster, etc...
Nothing in life is free, there is always a trade off.
With that in mind, you can try to program your numerics code hiding as much as possible the fact that you use numpy. I have found that this is the only barely bearable way to do numerics/image processing in python.
In : x = np.array([1,2,3])
In : x[0:500]
Out: array([1, 2, 3])
However, after translating some exercices from Octave to python/numpy, I have experienced a great deal of frustration. A simple three-line solution in Octave becomes a fifteen-line behemoth that does not even work consistently across different versions of the system.
It sounds like you're mostly having trouble with OpenCV, which has a horrid API, especially the python version adds another layer of confusion on top of it, and dumping things back and forth from opencv Mat objects to numpy arrays sucks and wastes time. The documentation also is fairly bad. The error messages, like you say, kinda suck. So I can't help you much with that but know that the opencv python bindings fall WAY below the standard of what's considered good. Stuff like simplecv is a bit better but doesn't cover nearly as much as OpenCV does.
Numpy itself is great though, for how little code you often need for a complex, vectorized matrix operation. I've used stuff like Eigen (or even boost matrices or raw BLAS stuff) which I think is way more painful to deal with. For the annoying cruft that any of those bring, the minor speed boost over numpy isn't worth it for me most of the time.
The other thing is, when folks complain about the slowness of python when doing computationally heavy stuff like this, it's often the situation that they have a bunch of vectorized operations, and in the middle somewhere, they stick in something like a regular python for loop, which is orders of magnitude slower. The trick, especially for hot loops, is to just avoid using anything but vectorized operations. In something like C++ of course you're less likely to notice this happening because there is much less overhead, but there'd still be an impact of doing stuff the non-vectorized way anyway.
For profiling, take a look at this: https://www.huyng.com/posts/python-performance-analysis plus the internal cprofile module that does call-graph profiling. It's nothing amazing like vtune or whatever, but it gets the job done.
As far as architecture goes - the relative lack of typing structure in python definitely makes life easier in many ways and cuts down on boilerplate, but it also makes it much easier to shoot yourself in the foot like you say. One thing to keep in mind is that when writing data heavy code there is an additional layer of concerns in play. Stuff like the "shape" of your data. It sounds like if you're running into this, what you want is to make a big diagram / graph of how data is flowing through your code, each stage of computation, and what the data looks like before and after. When you understand this well, it's much easier to reason about the code.
One thing that helps to know is that the style of programming in the python world is a bit different. In C++ land, the instinct is to start with building architecture, and when you inevitably realize that you've misunderstood the problem domain, you do a massive refactor. This usually constitutes large changes in how your code is structured but C++ gives you tools, like strong types, to do this rather easily. With python, unless you're setting of on a massive project, you start with minimal architecture. Write code that does the thing you need. Then, as you find yourself repeating yourself or doing unnecessary things, or dealing with complexity you don't need to know, only then do you start giving structure to your code bit by bit, as you need it. It's a different philosophy that's the result of a different set of functionalities.
In the larger context, I mostly have the opposite experience as you - I can pull out C++ whenever I need it, but I almost never do. The set of tools I have for computation an analysis in python are so much richer. For example, imagine c++ code for reading an excel sheet, and scraping some data off the web, cleaning both, joining them together, running a complex mathematical calculation, and saving a few plots. And I can do this from scratch in half an hour, around 100 lines of code. I also have an interactive shell that I can explore the data and the libraries I have, and I'm not stuck in the code-compile-code-compile loop. The productivity boost is amazing. In contrast, I don't even want to imagine what it would be like to attempt to do this in C++. Python is great glue. Admittedly, you're pushing at the edges of what python is useful for. It's definitely possible (I've written computer vision stuff with python before, and it worked OK), but if you're not making use of any of these benefits of using python, you might as well use C++ and enjoy the speed and strong typing.
Hope this helps your python experience be a bit smoother!
Also, do you use pandas, and why not?
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
Enabling tab completion
There's no good reason to prefix it that way.
Pip can handle all sorts of requirements including Python version, OS, etc. pip install ipython currently grabs ipython-5.3.0 on Python 2.7.
The one without a version number should be a symlink instead. RPM and DEB support that.
Then 'python' can read the file, and execute it under the right language.
The problem is if we ever switch 'python' to 'python3', then there will be 5 or so years when 'python' becomes unusable, as you won't know on a given machine if it's python2 or python3, and almost no program works on both.
This kind of already happened. Arch already has Python 3 as 'python', and if you use any of the popular environment tools (virtualenv, pyenv, conda), whichever version of Python you pick when creating an environment will be 'python' inside that env.
It's not that big a problem, though. It's quite possible to write code that runs on both Python 2 and 3, and it's surprisingly rare (in my experience) to call unqualified 'python' rather than using an explicit path.
I really like that I can install command line tools and then just reference: <virtual env>/bin/<tool name> and it will run it with the correct interpreter.
If you use setuptools correctly it is fairly easy to install and run packages e.g.:
my_env/bin/pip install mypackage
my_env/bin/pip install mypackage.whl
I sometimes send people python scripts, I can just tell them to run it from the command line. I can assume any linux or Mac OS X distro will have some version of python2 installed, so I can just say "run 'python myscript.py'".
If I give them your instructions, then from my computer I don't have venv-3.5, or venv. I'm sure they won't either. Standard ubuntu installs don't have virtualenv or pip either, so non-root users have pain installing any of these things.
I mean even within a single major version there are changes that are incompatible. For example 2.6 doesn't have OrderedDict, doesn't have set comprehension. 3.3 doesn't have asyncio, 3.4 doesn't have async/await, 3.5 doesn't have types annotation for variables etc.
So having python2/python3 doesn't really benefit much especially that at this point everyone is moving away from 2.7.
Edit: virtualenv is builtin in python since 3.4.
More generally, the whole subthread is about the behavior of the command 'python'. Responding to the complaint "I want 'python' to have behavior X" with "it doesn't, so do something different" isn't really appropriate to a discussion of how 'python' should behave.
They did duplicate this functionality in the Python launcher on Windows , but that's 'py', not 'python'. There might be good reasons for not replacing the regular python binary with a launcher on Linux, I don't know.
Well, the only reason that functionality is available in the shell is that python is capable of processing python scripts. The shell isn't. So this question doesn't make sense to me.
#!/usr/bin/env python3 heading an executable script.py means "when I execute this file, I really mean 'run the command /usr/bin/env python3 script.py", and that means "run the command 'python3 script.py'.
And running scripts by invoking python yourself is absolutely standard. Consider "python setup.py install" or, for django, "python manage.py shell".
I'm not suggesting that the shell would execute Python scripts, I'm suggesting that the shell can be used to dispatch the script to the correct Python version. Just like all the other scripting languages. You seem familiar with the concept, so I don't get what's confusing about it.
I agree that 'python' should read the shebang when available, at the moment PEP 394 says it should mean 'python2' but is subject to change in the future, but it's easy to solve the problem yourself. Either invoke scripts directly with the ./script.py syntax, or make a script that reads the shebang and uses the right version and then symlink that as 'python' instead.
Instead, they didn't play fair, and gave themselves an unfair advantage.
Seriously, what am I reading here?
So, yeah, IPython dropping 2.7 is pretty huge. Almost like moving to a different language.
Oh, that's plain silly. Isn't the whole point of Python 2.7 to facilitate the transition to Python 3?
Let's say I developed a python-like language called "schlython", can I call on the python-2 community to convert all "old" python-2 code into my competing language?
> nobody was forced at gunpoint
The caretakers of py2 are the promoters of py3. Support was dropped for py2 without any serious effort to find new caretakers, so that the demise of py2 would promote switching.
Reading through your posts here, I don't know what your problem is with the Python team, but please do get over it. It's incredibly annoying dealing with someone who just seems to think they're entitled for more than a decade of support of a version of their free product.
> but please do get over it
This is an ad hom - please make your posts more substantial, it feels like you are the one with a chip on your shoulder if you can't respond to these points. Your original comment added nothing to the thread but snark.
I understand that some people want to keep old code running as long as humanly possible, which is their prerogative. But there's no reason to imply that conversion to Py3 is unduly difficult or something that shouldn't be undertaken, even for large/complex codebases. If Google can write grumpy to transpile Python code to Go, there's no reason it can't improve 2to3 to handle their incompatibilities.
The truth is, once most libraries are 3.x compatible, porting is very easy (as easy as a 2.x -> 2.y transition). And now, we've reached that point. People are starting to catch up.
There are some exceptions of course, those who heavily rely on 2.x unicode behaviour and such, but all in all they are rare. So now, it's much easier than it used to be.
You mean I can just run all my python2 programs with python3 runtime with no changes at all. Because that was how every upgrade before python3 went.
Breaking changes are a routine thing in any maintained language; heck, even a conservative project like GCC breaks code between major releases.
The code that doesn't run on Python 2+3 only runs on Python 3. Have at it.
eikenberry asked: "can just run all my python2 programs with python3 runtime with no changes at all"
You replied: "Nearly all my code co-runs on 2.7 / 3.6 with almost no hacks. So, yes.", implying that, yes, you can run py2 programs without change, because this is true of your code
To which I replied "And your code is all code?"; Meaning, just because this is true of your code, doesn't mean it is true of eikenberry's code, or any/all code.
You then responded with a project that apparently runs fully on py3, but only partially on py2 - what is the point you are making by posting that project?
So I guess it's time for me to get on board with Python3!
Because you are suffering from the exponential growth mindset, in which the next new thing is as important as the sum total of all things that came before it. In that mindset, the future is emphasized and the past is heavily discounted.
But that mindset is not shared by most firms in most industries, it's a unique pathology to the web.
Businesses don't re-write working code just because a newer language version is available. Landlords don't tear down old apartment buildings just because more efficient building technologies are available. We live in a world in which COBOL runs a lot of mission critical code.
Anything that touches important code that is now working is a risk and an expense, and what is the tangible gain for undertaking this expense? What new features will be added? How much more revenue will you get? What is the opportunity cost of having your engineers do that than something else, like adding a feature or improving test coverage?
Before you say that it's simple, be aware that you are talking about messing with libraries that may not be maintained anymore. You are going to spend a lot of time debugging those old libraries as well as writing unit tests for them. Nothing in the world of software engineering is simple, especially when it comes to maintaining a large body of scripts.
Imagine if in the Java world, it was announced that the Java 8 runtime would not support running Java 6 or older jars. How many jars are businesses running that were written in 2005? No one even know who wrote those libraries.
Suppose in the C world, it was announced that code written in C'99 and before would no longer compile. The Linux kernel has code written in the 90s, and GCC has code written in the 70s and it's still supposed to compile under the most modern compiler.
Moreover automated code re-writing tools don't come with guarantees of soundness or accuracy. There will be breakage, it will occur in random places, and there is zero upside to spending a lot of money to get an existing project to the same level of functionality as it had before you started messing with it.
People who work in other languages get this. It's really not a difficult concept. Everyone other than the Python community agrees that Python did a massive screw up with python 3, just as the Perl community committed suicide by making Perl 6 not be compatible with Perl 5.
I don't pretend to know what will be the future of Python -- maybe there isn't enough enterprise users out there to make a difference for the direction of the language -- but you do need to understand why breaking existing code is a deal breaker for the majority of business customers. You may disagree, but at least don't pretend that there is some irrational mysterious resistance to converting python2 code to python3.
Imo that's either inaccurate or misleading.
In Perl 6, `use` followed by a module name followed by a `:from` adverb leads Perl 6 to load the module, initialize it, and import its public functions, constants, variables, etc., AUTOMATICALLY BINDING THEM TO PERL 6, for any given supported other language the module is written in, provided someone has installed a suitable loader/binder (and the user has installed that plus the other language's interpreter, and the other language's module(s) that they wish to use).
I'll start with a toy example.
With a suitable "Perl 5" loader/binder in place, one could write the following and it would work as expected provided the user had installed the loader/binder, a compatible Perl 5 interpreter, and the Business::ISBN module (from https://metacpan.org/pod/Business::ISBN):
my $isbn = Business::ISBN.new( '9781491954324' );
With Stefan's loader/binder, not only do toy examples like the above work, so do non-toy examples like writing controllers for the Catalyst MVC web framework even though Catalyst is large and complex, written in Perl 5 with XS (C lib) extensions, written without regard to Perl 6, and even though a controller has to be a subclass of a Perl 5 class provided by Catalyst.
The same feature is available in any language with a decent FFI (must support loading an arbitrary shared library, marshalling/demarshalling data between languages, and invoking functions in the shared library).
This is possible in Python, Ruby, Java, Racket, Rust, Perl, Go, and plenty of other languages I'm forgetting or haven't used. Yet who would seriously argue that "Python is compatible with Perl" because you can embed libperl in a Python program?
You might as well argue that "Rakudo is compatible with libssl", for all the relationship Rakudo source code has with Perl source code.