
One million lines of code ought to be enough for anybody - chmaynard
https://lwn.net/SubscriberLink/807218/7589bd420fa9cfbe/
======
delsarto
> (no file system would need a path longer than 260 characters, right? :o) )

This is poking fun at Windows' PATH_MAX, but it's very easy to hit
BINPRM_BUF_SIZE's 128 character path limit for #! lines; maybe not so much for
humans, but with CI systems installing things in weird deep directories (e.g.
virtualenvs ... see
[https://github.com/pypa/virtualenv/issues/596](https://github.com/pypa/virtualenv/issues/596)).

So the old adage about glass houses and stones is still good advice ;)

~~~
throwaway8941
Nix developers ran into this very limit.

[https://lwn.net/Articles/779997/](https://lwn.net/Articles/779997/)

~~~
aidenn0
Unless they split the line without whitespace, that example is already non-
portable because it's not well defined what happens to shebangs with more than
one argument. IIRC linux merges all of them into a single argument, but other
unixen do different things.

------
temac
I had to check to see if it was a repost of e.g. an April joke proposal or
something like that.

Nope. One guy randomly wants to arbitrarily limit random Python things to
1000000, cite vague optimization potential advantages (in CPython, yeah,
right...), and doesn't even want to detail the impact of said optims when
further asked.

IMO this is just a waste of time for everybody and should be firmly rejected
ASAP. Even just proposing to limit to 1M "source" lines is extremely naive and
the sign of somebody without much real world experience: there are many cases
when "source" code is not the actual source, and has orders of magnitude more
lines.

------
wolfd
I still don't understand why anyone would want this. It seems pretty hostile
to the users of Python without any real benefit. Maybe the benefit is the
people who are pushing for the change get to smugly know that people won't do
horrible things with the language. Maybe I don't get it, but... why?

~~~
swiley
It’s about having a complete specification so that the compiler and VM can be
correct.

Correctness when you have arbitrarily long input gets complicated very easily.
I was originally skeptical when reading, it seemed ridiculous and arbitrary,
until I realized that’s what they were going for.

~~~
sitkack
It is not. Python already doesn't have a spec, it has the CPython interpreter.
This is repainting the facade while you still have a sump pump that isn't
working.

There are a myriad of better ways to improve Python and this isn't it.

------
sandoooo
List of languages that compile to python:

[https://github.com/vindarel/languages-that-compile-to-
python](https://github.com/vindarel/languages-that-compile-to-python)

This change is almost guaranteed to break most of these.

~~~
majewsky
Also, aren't there packers that compress your entire Python project into a
single file for easy deployment?

~~~
viraptor
This could be really worked around. Having a huge string to exec would not
trigger any of the proposed limits. But I'm not sure useful apps are packed
that way. Multiple namespaces in a single file sound pretty tricky to
do/manage.

------
janzer
Just want to note that is only a proposal which:

* The steering council decided that they will make the final decision on it directly. [1]

* Their has definitely been a less than enthusiastic response from the python devs overall and some of the steering council members specifically, including GVR [2]

[1] [https://mail.python.org/archives/list/python-
dev@python.org/...](https://mail.python.org/archives/list/python-
dev@python.org/message/KY46EXGLKNTFMQZXKHMMYWD2GIM5PDL5/) [2]
[https://mail.python.org/archives/list/python-
dev@python.org/...](https://mail.python.org/archives/list/python-
dev@python.org/message/EYV2CCV55UC7Z2EVG4DRTFUVH3WRHWNB/)

------
glofish
It made me wonder how fast would Python parse a simplistic program with 1
million lines.

    
    
      N = int(1E6)
    
      for x in range(N):
          print (f"x{x}={x}")
    

generates lines like:

    
    
      x0=0
      x1=1
      x2=2
      ...
    

running it with:

    
    
      time python big.py
    

takes just 5 seconds - I am impressed! Maybe the limit is too low.

~~~
BubRoss
You are likely just benchmarking the time it takes to print one million lines
to stdout.

~~~
nicebill8
I assume the first script isn't the benchmarked program, but that it just
generates it. The pure parsing of the second script would be the benchmark.

Having said that, the fact it takes 5 seconds is pretty arbitrary—we can't
really say it's good or bad unless there is some basis for comparison.

~~~
jvanderbot
a million sequential memory accesses should not take 5 seconds. Maybe 5 ms.
Most likely the time is spent on VM /interpreter startup and shutdown, OS
overhead, etc.

~~~
gpm
That's not a million sequential memory accesses.

Assuming a fairly naive python interpreter, it involves

\- Parsing a million lines of code

\- Outputting bytecode for a million lines of code

\- Running a million of lines of bytecode, including

\- Allocating a million int objects (ints are heap allocated in python)

\- Hashing a million identifiers

\- Putting those million identifiers into a hashmap, each pointing to the
corresponding into object

\- Deallocating those million int object

Plus VM/Interpreter startup/shutdown. But we can benchmark that by running an
empty python file and it's not significant.

~~~
jvanderbot
Given the downvotes, I'm fairly certain I missed the direction of discussion.
5 seconds is an astronomical amount of time to do something as described in
the parent comment.

~~~
glofish
Is 5 seconds such a long time really? For parsing a file, then creating and
evaluating 1 million local variables, all inserted into the local namespace?

Made me wonder, how long does it take to compile a million line long C
program? Let's see:

    
    
      N = int(1E6)
    
      print("""#include <stdio.h>
      int main() {
      """)
    
      for x in range(N):
         print (f"int x{x}={x};")
    
      print("""
        return 0;
      }
      """)
    

now generate the code and compile it:

    
    
      time gcc big.c 
    

drumroll ... ummm ... look at that ... doesnt finish, hangs forever (or at
least longer that care to stick around wich was 10 minutes) ...

notably for 10,000 variables it takes just 0.3 seconds, but raising that
number one order of magnitude to 100,000 makes it hang, 100% CPU 3% memory
used. Something has an awful scaling behind the scenes.

~~~
jvanderbot
But now we're timing an optimizing compiler, and not an interpreted program of
1 million lines. I understand that you couldn't generate a program of 1
million lines, but still, I would expect compilation to take longer than
execution for just about any non-looping program. Again, I am missing the
direction of this discussion.

Compilation is _technically_ NP-Hard, meaning it scales very poorly, as you
described, whereas interpretation is not (no register allocation, etc).

[https://cs.stackexchange.com/questions/22435/time-
complexity...](https://cs.stackexchange.com/questions/22435/time-complexity-
of-a-compiler)

~~~
glofish
the main purpose of the exercise was to explore what happens when you throw 1M
lines of code to a language,

our estimates to what happens could be wildly inaccurate,

I was pleased with Python finishing in 5 seconds because I thought it might
not even work at all due to some internal implementation detail that I was not
aware of - like the ruby example that fails with a stack error. I am pleased
to see that Python can handle it.

As for the GCC compilation I find it quite unexpected that it compiles 10K
lines in just 0.3 seconds, but then seems to hang on 100K lines. Not so easy
to explain.

------
Aperocky
No, don’t control what people do!

Remember that before Unix file system used to enforce what’s in the file
instead of allowing storing whatever in whatever? That specific suffix/file
validation that are carried in part on early Windows? People were sufficiently
pissed to create Unix, which had none such bs.

Less rule is better than more rules, You can always put whatever you want into
best practice but not expect people to follow hidden rules like this one.

~~~
tbrock
Maybe it’s not appropriate for the official cpython but would be appropriate
for something like pypy?

~~~
Aperocky
Maybe, that would be up to the project.

Pypy is in a spot where if you're after speed you would do it in another
language, while package accessibility that cpython has is gone. I don't see it
going mainstream but it's a nice project - and I can see why it has more
limitation.

------
thdrdt
Original title "One million ought to be enough for anybody".

The proposal: [https://lwn.net/ml/python-
dev/93cf822c-4d67-b8f7-1d91-7d8053...](https://lwn.net/ml/python-
dev/93cf822c-4d67-b8f7-1d91-7d80536d5bd3@hotpy.org/) ([Python-Dev] PEP
proposal to limit various aspects of a Python program to one million. )

------
rhacker
The problem with 1M lines of code as a limit, is that some ML tools actually
export a hyper-fast executing C (or other language) program that emulates the
built/trained model.

I don't know why people would assume that's enough of a limit if there really
isn't a need to limit that aspect.

------
maest
This reminds me of q/kdb's limits[1]:

    
    
        8 maximum parameters on a function
        96 maximum constants
        32 max globals
        24 max locals
    

The self flagellating q developer will enjoy these limits under the guise of
good coding practice enforcement (if you have more than 24 variables in your
function, you should split your function up)

1: Table of Limit Errors at [http://www.timestored.com/kdb-guides/kdb-
database-limits#con...](http://www.timestored.com/kdb-guides/kdb-database-
limits#constants-error)

------
Glyptodon
Are there common assembly/CPU native ways of partitioning a 64 bit register to
store multiple smaller size values and do work on them? (I haven't written
assembly since some MIPSs in college.)

~~~
billfruit
It is there on x86-64, not sure about other archs.

------
laretluval
100,000 lines was enough for Terry Davis.

------
asah
21 bits seems weird if the goal is optimization. 24 (a multiple of 2, 4 and 8)
would seem more natural and incrementally less controversial (8 MLOC vs 1,
assuming signed line numbers)

~~~
antoinealb
From the article, the observation is that 3 21 bit values can be packed in a
64 bit integer.

~~~
rwmj
Which certainly _sounds_ like a terrible idea, and there's no actual evidence
presented of performance gains or benefits.

~~~
ModernMech
The obvious benefit would be if your 3 values always fall in that range you
would save 129 bits that would otherwise always be zero.

~~~
rwmj
But what proportion of the Python interpreter's memory is consumed storing
line numbers? How much slower will it run if it has to unpack the numbers any
time it needs to use them? Does Python even store triplets of line numbers?
(This seems unlikely to me, so to achieve any saving at all will require
invasive code changes.) These questions could be answered with hard data, but
the proposal has no data at all.

~~~
ModernMech
> But what proportion of the Python interpreter's memory is consumed storing
> line numbers?

I don't think the packing part was talking just about line numbers. They
propose a number of things that would be limited to 1e6 elements. The question
about performance is a good one and needs to be studied. This proposal is
arguing from a different perspective though, one that focuses on reducing
ambiguity rather than optimizing for performance. Python as a language is an
exercise in sacrificing performance for other objectives, so I think that's
fine for a language like this to consider this question even if it means
performance could be impacted negatively.

------
ganzuul
How many lines of code can an organization effectively maintain?

