One benefit of using a main function in Python is performance. If you’re doing significant computation at the top-level, it’ll probably run faster if moved into a function.
I believe this comes from the fact that local-variable lookup is faster than global-variable lookup - because the former is lookup-by-index and the latter is lookup-by-name.
The same is true in Julia: “Any code that is performance critical or being benchmarked should be inside a function.” [0]
Not just because it's a lookup-by-name, but because globals are mutable everywhere, the interpreter has to assume that some other code has mutated the variable and has to re-fetch it in many cases.
> It's up to the release manager, but personally it feels like you're pushing too hard. Since you've done that frequently in the past I think we should help you by declining your request. 3.8 is right around the corner (on a Python release timescale) and this will give people a nice incentive to upgrade. Subsequent betas exist to stabilize the release, not to put in more new stuff.
Another benefit is re-usability. If you want your program to be callable as module or script from elsewhere without going through the tedious shell functions, "def main()" is the only way to go.
The difference is that you've encapsulated your coding logic in mainMethod() instead of writing under that if condition itself. If latter was the case, you can't call that code from another module without resorting to os shell level features.
The code encapsulated into "__main__" assumed to be module-specific, this is the purpose of that pattern. If you want to make some code available for another module, you just refactor it into separate function.
But the downside is that I like to use ipython and %run the file and if it's not in a function, I can then inspect all the intermediate variables and not just the result. Every time I do put it in a function, I regret it for this reason.
The same is true in Julia but for slightly different reasons. It's not just quicker variable lookup in Julia, but the fact that it will be JIT compiled. (Unlike CPython.)
I always saw Python and similar languages (Perl, Ruby, PHP maybe) as attempts at replacements for complicated shell scripts, so it never seemed odd to me that there was no main function, since a shell script has no main either. It just starts executing at the top.
Edit: Just saw that ktpsns and I made pretty much the same comment. I'll add that when I learned C I found the idea of a main function so confusing at first. I wanted to know why it had to be called main and not whatever else I wanted it to be. On the plus side, it led to me exploring how compilers and linkers work, which made me understand hardware a lot better.
especially in contrast with c and other languages that shoehorn you into defining a main function. Namely, you can put a "main" into a module, and import that into another module without causing trouble.
I use this a lot during a development cycle (not so much in prod). Say I'm trying to replace a function foo with a new implementation foo_new in a large package. I write some rudimentary tests and benchmarks in "main" which compare the two methods (including a late-import of the top-level package), and directly execute that file until I'm satisfied. At that point, I'll switch (foo, foo_new) -> (foo_old, foo) and start running unit tests. Once those unit tests pass, I remove "main" (usually, promoting some fragments of it to new unit tests) and foo_old.
> especially in contrast with c and other languages that shoehorn you into defining a main function
i don't think that's a problem in languages with namespaces/modules -- a module might define a main(), it just wouldn't be used when it's imported. (unless you're in C, then you'd get a name clash)
> Many languages start running your program by calling a function of yours that must have a specific name. In C (and many C derived languages), this is just called main(); ... Python famously doesn't require any such function...
Nor did BASIC.
I think the better question is, why does C (and C derived languages) require a main function?
Yeah, this was a funny thing for me to read because my first language was BASIC. When I encountered C, it was the surprising and unique one because it required all code to be inside a function body.
I suppose one of the advantages is that it makes separate compilation easier. Take several .c files, compile each one separately into a .o file, and what happens when you try to link them together? Where should you start executing?
Defining a function with a well-known name is a simple, easy solution to that.
Python doesn't need to solve this problem because you give it a file to start interpreting, and that file is the entry point. You either pass the file name as an argument or you use #! and that determines it.
Maybe it goes back to assembly language. If you look at a block of assembly or machine language code, it's not immediately obvious where the entry point is. You could guess that it's at address 0, but I don't think that's the case for all machines.
I think more important than having a main() function is to clearly delineate the entry point. And in Python, the entry point is the first line of the program.
Now I've noticed, at least within the Arduino environment, that it uses setup() and loop() as entry points, but code is executed before these functions are called, if for instance a global variable definition invokes a class constructor. I don't know if that's how "regular" C works, but it creates ambiguity as to what runs first.
There are other languages where it seems hard to guess where the actual entry point is located unless you know where to look.
You don't get quite this level of ambiguity with standard C, since it does not provide objects (in the object-oriented sense) with constructors, in contrast to C++ that is used in the Arduino environment (in C++, initialization order of global objects is indeed unspecified, inviting subtle bugs to occur).
However, many common C compilers provide methods to declare functions to run at initialization time, e.g. gcc's `constructor` attribute, opening this rabbit hole also for C programmers.
Regular C will start someplace in the runtime library, which will do things like global initialization. Once that's complete the library will call main().
The OS determines the exact place in the code where execution starts. In DOS I think it was address 0x100. A more modern OS probably uses an executable file format that allows you to specify the starting address.
The actual entrypoint to your program is defined in the elf header of whatever binary you execute (doesnt matter if it is C or not). Usually this entrypoint points to a function in libc, which after some setup calls the user's main function.
> ... why does C (and C derived languages) require a main function?
It doesn't actually. You just need an entrypoint defined for the loader. In practice most people don't want to go around libc, or fiddle with internals. This is much less true in embedded development, where main functions are sometimes missing.
Why doesn't C have global code? Probably because it would require abstracting further away from the generated assembly. If there is code that is not inside a function, how can it be called? Remember that the entry point to all programs is a function.
Disclaimer: I know very little about python internals.
There is a "true" main function for the interpreter itself. From the perspective of the operating system and CPU the python code isn't executing, the interpreter is.
But conceptually, the python code probably doesn't have main function. Python code is so far abstracted from how the CPU works that it likely doesn't need one.
Just seems kinda odd to me to distinguish Fortran and Python here, based on how many levels of abstraction there are between the user code and the main function.
How would you place Bash scripts then? Or a JITed language like Julia?
I only mention python because people were asking about it. There are really two unrelated questions here:
1) How does a computer start executing your code?
2) What kind of abstractions does a programming language provide?
Having anything other than a function as a top level most likely implies an abstraction. The second question is pretty much irrelevant when you are talking about interpreted languages. These aren't programs at all from the perspective of the CPU.
JIT is a different story, and one that I know very little about on a technical level. What happens when pypy generates some native code? Does it generate a function and call it? Does it have some other convention? I have no idea.
Python doesn't require you to have anything around a series of statements. Whereas Fortran requires at least one top-level program block. That program is a top-level function, which directly corresponds to MAIN__ in compiled code.
I ran the test. The assembly for that program creates a "main" function, which contains fortran init code, and a MAIN function that contains the user code.
IIRC it is the C runtime that calls main(). In C all executable code must necessarily be inside functions, so inevitably there had to be one 'special' function that was called first.
> I think the better question is, why does C (and C derived languages) require a main function?
That's not the true question. The true question would be, "why does C (and C derived languages) require that all executable statements be within a function?" Requiring a main function is just a natural consequence of that.
Because it's simpler. C has no concept of nested functions. If it instead allowed users to write the statements of 'main' at the top level, it would complicate the rules for scoping. For example, can an inner function access the outer function's variables? Or jump to its labels? Simpler just to say: all statements live in a function; functions don't nest; and execution starts at main.
At a guess, it makes the language simpler. The main function is just that -- a function. It uses tools the language already has to deal with arguments and exit codes.
Arguably it also keeps things like libraries simpler. For example, what would it mean for me to have top-level expressions in a dynamic shared library? Does it execute as soon as I dlopen() it? Does it create an implicit "init" function that I need to call?
Parts of the C runtime library need initialization and this initialization is done before main is called. Interpreters do this before you have a chance to run any user supplied code.
C needs a main function because all matters of initialization and module dependencies are left to the programmer.
C has no concept of a module A depending on a module B.
If C had top-level executable statements, there would be no defined order to them. Proof: C++ has a way of executing top-level code via file-scope definitions of class objects that have constructors and destructors. The order of these is not defined by the language; it depends on how the object files are linked and such.
Niklaus Wirth's Modula-2 language has syntax for this. Every module has an anonymous block of code that is executed at program initialization time. Modules declare dependencies on each other, which cannot be cyclic. If A uses B, then B's initialization code runs before that of A.
There is a single root module on which nothing else depends, and that module's initialization code runs last. In that initialization code, you can do the application startup; it serves as main.
I think some other languages in this broad family are that way also, like Ada.
Embedded systems written in free-standing dialects of C still choose to start with a function, though not necessarily main. For instance, Linux starts with a function called start_linux. That's not the first code in vmlinux that executes; there is assembly code above that, typically in a module called entry.S which will branch to start_linux.
This is very easy for embedded developers to deal with. There is no special run-time support for special global initialization blocks: it's just a function call named by a symbol that you can use as a branch target in the assembly code.
Even in Lisps, though there is no concept of a main function in dialects like Common Lisp, you have to specify a startup function when you save an executable image. This is because an image is not simply a collection of top-level forms that are to be evaluated, unlike a Lisp source file. It is not obvious what has to be executed when the image is restarted. Without the ability to specify a startup function, a possible choice might be this: that when the image is re-started, the save-image function (whatever it is called in the implementation) will appear to return a second time, perhaps with a value indicating "I'm now returning again due to the image being re-animated".
A compiled C program, at a very basic level, is analogous to a Lisp image.
A Lisp system loads some .fasl files, and then saves an image, specifying a startup function.
Similarly, in a C toolchain, the linker loads .o files and saves an image, recording in that image an entry point where to begin executing.
As a rule of thumb, if it's image-based, then it probably needs a startup function, unless there is an equivalent language feature for handling startup.
That applies to the booting of operating system images also. Consider:
And still I do write a main procedure in Python usually, which is called from that if __name__ == "__main__" thing. It is convenient to leave the top level asap and begin as quickly as possible to use procedures/function, getting out of the area, where one can accidentally change global variables, without noticing (in procedures you would have to write global as a keyword and would hopefully then notice you are doing something possibly dangerous).
Absolutely. But on the other hand, it's still very useful that Python does work like this: it's nice to be able to directly do things at the top level (e.g. setting up "constants" with a little computation, conditionally importing modules or importing fallbacks, etc.). As the article says, it's also an extremely clean way of defining what gets run in what order during startup.
Here's a thing: Coming from a world of Perl, PHP, Ruby, Bash, BASIC, batch etc. scripts, a main function is not neccessary at all. The procedural code just starts from the first line.
With this in mind, it is weird to even ask for a main function. :-)
Coming from the other direction, random code that executes outside of functions is sloppy. It's harder to reason about and is a regular source of confusion and bugs.
Even with those languages, I find a main method is a good idea just for local-scoping of variables compared to the top level variables being 'global' by default.
PHP is actually my favorite here - since the standard way people use it (behind a webserver) means that old-style PHP had as many entry points as it had files which could get really confusing really quickly.
Modern PHP usage will generally concede the point now and pipe all requests through internal routing executed off of a single script exposed to the webserver/whatever.
And you can perfectly well compile them and keep that same behavior.
C's entry point is in the runtime too, it's not your main function (but yeah, the optimizer will probably inline your main). At the end of the day, there's no real difference between compiled and interpreted languages, all the difference is on the toolset.
Once I worked with a library, which when imported, started printing with a printer to paper. So yes, your Python program absolutely needs a main function and please no side effects at import time!
I don't see how these two are the same. No side effects at import time? Sure. But a program isn't meant to be imported, it's meant to be run, so there's no real expectation that a program should be importable without side effects.
they're not the same, but they're both symptoms of the same underlying python behaviour: when you run a python file or import a python file, the interpreter immediately starts executing statements from the top of the file down.
The main function, in C/CPP programs, satisfies an operating standard in Unix and Windows systems for a defined program instruction entry point. With CPython, main is written into the python executable's source. Python scripts are not C/CPP-source executables so they don't need a main, they don't need this constraint, so what sense is there in imposing it?
No, this isn't so. And besides, the main function isn't even the first thing in your program to run. The standard library does some setting up first. For instance populating argc and argv.
This article really complicates a simple concept: EVERY program needs a main(). Period. Whether it is called "main()" or ".ORG", or it is just an address given to a linker map, or simply a hex address in a BIOS boot sequence. There is no avoiding it: one instruction must come first.
Python, Perl, Ruby, Bash and every other scripted interpreted language IMPLICITLY wraps "main" around your program based on the context, otherwise it wouldn't know where to start.
I find this article obfuscating a simple point.
EDIT: When I say .ORG, I'm referring to the standard assembly language designation for "execute this instruction first", and not ICANN. ;)
Yup - and even the most obfuscated language's main (PHP) has a single entry point - that entry point is just in apache[1] which figures out which of your bazillion PHP files to execute.
This article is bordering on nitpicking.
1. Or whatever other webserver - and actually you can directly invoke PHP via CLI, but I'm ignoring those for the most common way people tend to use it.
Bit snarky, but all I got from this post is "because it doesn't." Many languages don't require a main function. In fact, the earliest languages like LISP (and even punchcards!) don't require it, let alone many other languages like BASIC, awk, etc. The presence of a main function in later languages is, in a way, the outlier.
Why do some languages require a main function? I started out with JavaScript decades ago, and of course JS doesn't require a main function. When I started learning Java I thought the requirement that there must be a main function was just bizarre. I still feel that way, it just feels wrong.
Because languages that compile to a platform binary must obey rules of a system linker and a corresponding stdlib. Javascript is an active host for an entire js program, but C compiler is not. C source files turn into object files (.o/.obj and then .so/.dll for libraries) that have a particular format and may be linked with object files and libraries from other languages. A system linker then decides where the entry point is by looking in every object file for a _start symbol, which is usually implemented by clib, and it basically does:
void _start(void) {
extern int main(...);
init_clib();
exit(main(...));
}
The exact procedure heavily depends on a build target. Also, C compiler cli tool (e.g. gcc) is usually a frontend for a compiler and a linker, among other tools, which may add to the confusion.
Object files are collections of symbols, either external, or defined in that file. Every non-extern symbol is a region (addr, len) with either data or code. There is no unnamed/main symbols by design. In theory, C could abstract that and allow code at column zero, but there is no rationale for that.
You've seen in Java how the main function has the `args` parameter as input, an approach which originally comes from C. In C code, main is actually not the first code that gets run during the program; a compiler-provided shim function called "start" sets up the environment, which includes creating the `args` array, and then calls main and passes args in as a parameter to main. The start function needs something to call, and args needs to come from somewhere; having an explicit main function with an args parameter isn't the only way to achieve this, but it's not a bad way by any means.
This whole topic is like comparing apples to oranges. The python interpreter(which IS compiled and has a main function) defines the rules of whats required in python. And it can reasonable assume that the entry point is the beginning of the .py file its running.
Why not keep “main” as a reserved function word and just append the magic “if name == “__main__” behind the scenes if that function is seen? For a language focused on immediate usability, it’s a stupid first hurdle to have to clear to greet the world.
I think the people replying "explicit is better than implicit" are cargo culting, there are quite a few areas where Python was more than happy to add a bit of magic to make the language smoother to use. Besides `if __name__ == "__main__"` is hardly explicit if you're not familiar with the idiom (what's `__name__` exactly? Is it the file name? The process name? When is and isn't it set to `__main__`?).
I can think of a real problem with your approach however: in languages with a main function you usually can't have top level statements, only declarations. So in C if you write at the top level `while (1) { printf("Hi\n"); }` you get a compilation error. Now as compilers get smarter you can get constexprs in some languages but it's still very specific and entirely static. I guess you could also mention global constructors, but that generally comes with severe caveats and should be used very carefully.
Python on the other hand treats top level code like anything else. That's why the `if __name__ == "__main__"` pattern even works in the first place. Having a real-looking main function would send the wrong message IMO, because it would make people that it's the real entry point of the program which it's not, in Python the entry point is the first line of the file being executed, end of story.
I do think that Python would benefit having a clearer and more explicitly named helper function or variable for this use case however, something like "if this_file_is_executed()" or something like that. At least this way you'd know what the code is doing instead of this awkward anti-pattern being used these days. But of course by now it's probably not worth changing.
You only need the __main__ magic if you want a python file to be both a module and an executable script. This is a moderately advanced requirement. Beginners will tend to write scripts OR importable modules.
For debugging-in-REPL purposes it's often very convenient to have it both ways. I was taught as a beginner to treat it as idiomatic boilerplate for that reason--no harm in having it, and there when you need it. Then you can just import your script and ad hoc test your functions.
It's a good technique prior to introducing formal unit testing (and even then, supplemental ad hoc has advantages).
1) One of pythons design philosophies is "explicit is better than implicit".
2) This is not a hurdle to hello world.
print("Hello World")
is a complete program that does exactly what you want. Needing to wrap it in a function would be an additional hurdle.
For complex programs; that is a hurdle well worth jumping. But for simple scripts (python is a scripting language), or teaching purposes, it is not always needed.
I understand that what you have is a complete program but every new Pythonist is going to next read the documentation that says, “Having a main method is a really good idea” and go try to implement one. Sure, in 1% of cases, you’ll need the granular control over how main is called but since when do languages cater to the 1%?
Can't you just put code in `def main` and then have `main()` at the top-level if you don't need to worry about imports? It seems like you can probably get pretty far as a beginner before you need to worry about being able to both import and run the same file.
"if name == “__main__” isn't magic, it serves a purpose. Sometimes name == “__main__” and sometimes it doesn't, and I want to specify different behavior in different cases.
Here, name is the name of the module. When the module is run by itself, this is the same as __main__, in which case I want the things specified by the if to run right away. This could be calling a main function, but a lot of times it's just is some tests or examples.
When the module is imported into another file, it's name is not the same as __main__. In this case I usually only want things in it to do stuff when they are called by the other file.
This doesn't have anything to do with a function literally called "main" though.
if __name__ == "__main__":
main()
Is a common idiom, but there's no connection between the two uses of the word main.
The difference is simply that compiled languages distinguish between 'static' and 'dynamic'. Static things include types, constants, and static variables. In interpreted languages, these things are all dynamic constructs--defining a type is a runtime activity that actually involves executing some code that changes the internal state of the interpreter, and it can (and regularly does) happen intermixed between other computations.
Of course syntactically-speaking, compiled languages could allow users to interleave these static declarations amid instructions and make it the compiler's job to correctly sort out the static from the dynamic (and indeed many languages do this); however, this makes it trickier to write good parsers and it also makes it harder for humans to distinguish at-a-glance the static from the dynamic (and this distinction is important in programming).
> I suspect the answer is "we care about existing programmers more than people who are learning to program".
Arguably if you care about people who are learning to program, you don't just punt on teaching them the distinction between static and dynamic. I think it would be better stated, "we care more about the long-term interests of novice programmers than we do about their short term frustrations".
I didn't claim that static typing depends on putting things inside of an explicit main function. Only that the function "boilerplate" serves to hint to the user that the things in the function body happen at runtime to distinguish from the other things which tend to happen at compile time. Static types are an example of things that happen statically (for example, declaring types). Or course this is just a hint--it's not necessary and as previously mentioned, there are many languages that don't make this syntactic distinction.
> the function "boilerplate" serves to hint to the user that the things in the function body happen at runtime to distinguish from the other things which tend to happen at compile time
'hinting' seems vague. Programmers who know what they're doing know that calling functions happens at run time. Obviously you can call a function without needing a wrapper function or a wrapper class + method.
To be clear, we’re debating the merits of a special syntax that allows the syntactic intermingling of executable code and static declarations. While experienced programmers can probably sift through this and identify which things are static and which are dynamic, not all programmers are experienced (note that the original claim was that the intermingled syntax is better for beginners) and further even experienced programmers may appreciate the explicit syntactic distinction between static and dynamic. Lastly, it also makes it easier to write static analysis tooling (linters, type checkers, compilers, etc) since the intermingled syntax is more complex to implement.
It’s fine if you don’t agree with the value judgments—for example, if you would rather save a handful of characters in “hello world” at the expense of some clarity or technical simplicity. I’m just giving the rationale.
What's the point of teachin newcomers something they need to give up later anyway though? (once their programs get big enough to be split over several files)
I believe this comes from the fact that local-variable lookup is faster than global-variable lookup - because the former is lookup-by-index and the latter is lookup-by-name.
The same is true in Julia: “Any code that is performance critical or being benchmarked should be inside a function.” [0]
[0] https://docs.julialang.org/en/v1/manual/performance-tips/ind...