
Where do argc and argv come from? - signa11
https://briancallahan.net/blog/20200808.html
======
nneonneo
On Linux, after argc, argv and envp comes the even more mysterious auxv, a
key-value store for binary data. The kernel shoves a lot of interesting stuff
into auxv, including AT_RANDOM - 16 random bytes (used to construct stack
canaries and function pointer encryption keys), AT_HWCAP (processor capability
flags), and AT_SECURE (a flag specifying if the program is setuid and
therefore security sensitive). Although a lot of it is meant just for internal
C library use, it can be helpful for programmers too (e.g. being able to check
hardware capabilities without a trip to cpuid).

~~~
TallGuyShort
Huh - I remember using `char __env` as the third parameter to main. After a
quick Googling, it would seem that 's a GCC-specific thing. Surprised there's
a Linux-specific (or ELF-specific, as another commented) convention other than
that one.

~~~
jwilk
HN's comment monster ate your asterisks. :-(

It should be:

    
    
      char **env

~~~
TallGuyShort
Ah ha - now there's an argument in favor of the `char *env[]` notation!

------
vidanay
I feel like the article does not address the stated question.

arg(ument) c(ount) and arg(ument) v(ector) is what I was taught.

edit: After a re-read, the article is about where the actual values come from
and not the historical semantic meaning of the variable names.

~~~
zabzonk
Also, in C (and C++) the names of the parameters are not significant - you
could call them x and y and the compiler would be happy.

~~~
vidanay
In what language is the name of a method parameter ever significant?

~~~
senkora
Python. Since any argument can be called as a keyword arg, the names of
arguments are a part of the API, and renaming them can break user code.

I have written bugs because of this.

~~~
loopz
For compiled languages, it's neat to get compiler error instead of a nasty
bug.

~~~
kortilla
In python without explicit analysis using something like pylint you won’t find
out until function call time.

------
jez
I needed to understand linkers and object loaders for something at work
recently, and I can't recommend _Computer Systems: A Programmer 's
Perspective_ enough![1] It answered a lot of questions I'd always had about
computer systems.

I jumped straight into Chapter 7: Linking and learned so much that I read a
bunch of other chapters for the fun of it. E.g., Chapter 7 covered that the
`_start` symbol mentioned in this post is just a symbol in your executable
that the Linux executable loader knows to jump to, but it's not the only
special symbol!

The chapter on linking also covered how loading and linking dynamic/shared
libraries works, which was also really cool. I wrote up some of the things I
learned about that stuff too.[2]

[1] [http://www.csapp.cs.cmu.edu/](http://www.csapp.cs.cmu.edu/)

[2] [https://blog.jez.io/linkers-ruby-c-exts/](https://blog.jez.io/linkers-
ruby-c-exts/)

~~~
brandmeyer
Along the same vein, "How to Write Shared Libraries" by Ulrich Drepper goes
into much more detail than you might expect given the title.

[https://www.akkadia.org/drepper/dsohowto.pdf](https://www.akkadia.org/drepper/dsohowto.pdf)

------
chenxiaolong
One neat thing I learned is that `argv` and `envp` are contiguous on Linux.
You can change the process name that appears in `ps` by modifying memory that
`argv` elements point to. If you need more space, you can also skip NULL-
terminating `argv` so that it will read on into `envp`.

Chromium, for example, does this:
[https://source.chromium.org/chromium/chromium/src/+/master:s...](https://source.chromium.org/chromium/chromium/src/+/master:services/service_manager/embedder/set_process_title_linux.cc;l=130;drc=1be837b6f1420d2ae8ffb47aa46fe78f522eccba)

~~~
cryptonector
There are OSes where altering `argv[0]` changes the ps strings, and ones where
it doesn't. Arranging to support this is tricky, as it can be a way to attack
users of ps(1) and /proc!

~~~
saagarjha
Fun fact: iTerm recently had to redesign some of their APIs because the
process name APIs would be susceptible to a malicious program overwriting its
argv (the normal API for this has the kernel read out of the process's address
space).

------
m3047
I can't find a reference for DEC's PDP processors, but the 68000 (Motorola)
processor family had LINK and UNLINK instructions useful for constructing a
(calling) stack frame. DEC's later VAX (Virtual Address eXtension) processor
line included stack frames baked into the silicon. The CALLS instruction was
called with the number of 32 bit arguments previously pushed onto the stack;
the CALLG instruction was called with a pointer to a memory structure
consisting of an argument count and an argument vector.

From the foregoing progression it can be inferred that this idiom was common
and deemed a Good Thing by some faction in computer science of the time.

As for functions with specific names, when linking against libraries there are
typically symbol tables for resolving references between separately compiled
modules.

As a self-professed (I took out a small display ad in the print publications
_Computer World_ and _Asian Computing Monthly_ in 1984) "VAX Hacker for Hire",
I made use of C's utter lack of concern for such things by declaring a char*
arg and using pointer arithmetic to get the actual count of args to implement
optional parameters; I wasn't the only one. Between that and abusing the REF
and VAL pragmas in pretty much all DEC programming languages, it was good
times.

~~~
bartvk
> "VAX Hacker for Hire"

What did that cost? And was it effective in getting you projects?

~~~
m3047
1) My recollection was that _Computer World_ was around $400 for two
insertions, and _Asian Computer Monthly_ was slightly less for 6. They were
small ads.

2) Yes, I did make the acquaintance of a local (!) company, which I ended up
having a 10 year relationship with. I also had a few of entertaining
inquiries, none of which got me in trouble with the law, but also didn't make
me any money. (Know when to say "no".)

~~~
bartvk
Fantastic to hear. 400 bucks is extremely cheap for a long term relationship
with a client. Agencies in Europe put 20% on top of your rate.

~~~
m3047
It was in 1984. ;-) For comparison, in 2018 I took out an ad in the _Yakima
Herald_ prospecting for some ag tech work and didn't get a single nibble; that
ad cost me over $1000 for 4 insertions.

------
ChrisSD
In the Windows world there is no argc/argv. There is only the commandline
string. The C runtime will parse this to emulate the required arguments.

The commandline is stored in the process environment block and can be set to
anything when the process is created.

~~~
sumtechguy
[https://docs.microsoft.com/en-
us/windows/win32/learnwin32/wi...](https://docs.microsoft.com/en-
us/windows/win32/learnwin32/winmain--the-application-entry-point)

You then pass it into [https://docs.microsoft.com/en-
us/windows/win32/api/shellapi/...](https://docs.microsoft.com/en-
us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw) to get it split
out if you need it.

I can see why they did it that way. No reason to waste cycles on something if
you do not need it. Though most frameworks go and call it for you anyway these
days.

~~~
FreeFull
It's a carryover from the MS-DOS days. One unfortunate part is that each
program has to implement globs and any similar feature itself.

~~~
frutiger
Even in a modern Unix, globs are expanded by the shell before invoking exec(3)
with the individual arguments split out. They are not split/expanded by the C
runtime.

------
ploxiln
Just in case the author shows up here, there's a confusing little typo in
"Passing argv to our program":

    
    
        _start:
         popq %rdi
         movq %rsp, %rdi
         callq main
    

the second %rdi should be %rsi, which is confirmed by the explanatory test:

> If it's really that easy all we will need to do is add movq %rsp, %rsi after
> the popq %rdi we just added.

EDIT: now fixed

------
jeffrallen
Too bad the author skips over envp and what comes after...

[http://dbp-consulting.com/tutorials/debugging/linuxProgramSt...](http://dbp-
consulting.com/tutorials/debugging/linuxProgramStartup.html) mentions
something really fun after envp, which is the ELF auxillary info, including a
random seed offered by the kernel to seed rand if you want it.

See: [https://man7.org/linux/man-
pages/man3/getauxval.3.html](https://man7.org/linux/man-
pages/man3/getauxval.3.html)

------
dim13
Looks like a little known fact, but there is a third parameter too:

    
    
        int main(int argc, char **argv, char **envp)

~~~
kps
Non-portable, though. It's original to Unix and in sufficiently good
implementations and clones, but not even required by POSIX.

------
mywittyname
Is this something that the compiler normally handles or is it handled by the
runtime environment upon executing a program?

~~~
xandris
At least on the toolchains I'm familiar with, the linker will insert crt0.o, a
very basic part of the C runtime, which contains a _start implementation that
pulls argc, argv, and envp from the stack:

[https://en.wikipedia.org/wiki/Crt0](https://en.wikipedia.org/wiki/Crt0)

So it's not really part of the "compiler" itself, it's usually an object file
the linker includes in the linking job on your behalf.

~~~
saagarjha
Interestingly, macOS doesn't provide a crt0.o: execution begins in dyld for
most programs, which calls the normal program entrypoint (usually main). Linux
will do this too if you are dynamically linked.

------
saagarjha
> There is a third argument to main

Note that on macOS there is an additional _fourth_ argument to main, char
*apple[] (I would write this as a double pointer, but Hacker News would eat it
if I did). It contains things passed on from the kernel mostly on relevant to
libc and dyld.

------
sukilot
What's the point of this? If you are writing a C program, get a C book and
learn all about argc and argv. Of you know a good C book, post about it to HN.
If your aren't writing a C program, this is irrelevant. Why does HN every day
pull a random page from a manual and vote it to the front page? This is a
terrible way to learn anything.

~~~
wruza
This is the best way though to meet some guys who never wrote a book, but are
competent enough to delve into details that barely any book mentions. If you
want to know one thing deep, sure, get a book or two. But if you want to
create your understanding on numerous topics (wide learning), your way is a
very long journey that might not even end before you retire. I don't think
someone learns here as in a class, we just discuss matters of our interests.
You start with replying to tfa, and then it turns more and more whatever it
turns into. Nothing new for a forum-like, really, many do not even need
clickable headlines, since they do not click on them.

