Hacker News new | past | comments | ask | show | jobs | submit login
.C as a file extension for C++ is not portable (nibblestew.blogspot.com)
22 points by ingve on May 13, 2021 | hide | past | favorite | 64 comments



Interestingly, the blog post doesn't mention the actual problem that's the source of the MSVC behaviour: not all filesystems are case-sensitive.

So if you care about your project being able to be checked out on Windows, please refrain from using .C extension for C++ files. I've seen a toy language repo that used "something.s" for the test source file and "something.S" for the part of the language runtime (or something like this, don't quite remember), and checking it out on Windows was great fun: git declares that the working directory at the same time is both dirty and has no changes in files.


That's actually an interesting one - for gcc at least `.S` indicates an assembly file that requires preprocessing via `cpp`, and `.s` does not. You could definitely wire it up such that a different extension indicates preprocessing (and there might be some others already), but the typical convention would run into this problem. Of course, you're not supposed to check in the preprocessed source, but you'd likely run into the problem at build time when it tries to create the preprocessed file with the same filename as the original, so it doesn't really solve the problem.


> but you'd likely run into the problem at build time when it tries to create the preprocessed file with the same filename as the original

Only if you use -save-temps, which is rarely used (it's there mostly for when you want to debug the preprocessed code). Otherwise, in the common case where you are going from the .S to the .o, the intermediate .s is saved somewhere in /tmp (without -pipe) or exists only in memory (with -pipe).


Yeah that's a good point, I didn't really think about it too hard but usually you just go straight from the `.S` to the `.o` and more-or-less skip the intermediate.


If you care about Windows you also need to watch out for filenames like aux, con, prn, etc


Yep, checking out NetBSD or Minix 3 on Windows runs into this exact problem with the directory "external/mit/xorg/lib/xcb-util/aux". Thankfully, no NetBSD or Minix users need X anyway, right? :)


I've run into `.C` files a few times since the 90s. They were associated with projects managed by developers who were actively hostile to the notion that people might do development on Windows and they knew what they were doing.


Over the last 7 years I've seen many small or hobby projects on GitHub that started non-portable, and there would always be an issue/comment about "how do I compile it on my OS?". Now, it's completely anecdotal and recalled from memory, but the reactions of Windows-based developers to "doesn't compile on Linux" was generally "OK, I'll try to take a look", while the reaction of Linux-based developers to "doesn't compile on Windows" was generally "switch to Linux or try to compile with Cygwin but honestly, idgaf lol".

Have you ever seen a Windows program that to tries to create "etc" directory in the C:'s root and put its config files there? I've seen, and it was "ported" from Linux.


Windows is expensive. Linux is free. Now that MS offers free trial Windows VM images for developers it's less of an issue, but traditionally it's been significantly cheaper for a Windows dev to set up a Linux system than for a Linux dev to set up a Windows system.


> Now, it's completely anecdotal and recalled from memory, but the reactions of Windows-based developers to "doesn't compile on Linux" was generally "OK, I'll try to take a look",

This is just my prejudice, I haven't actually checked, but I think most projects that have support for both Linux (or Unix-y OSes) and Windows generally start with Linux/Unix and then have Windows support added. I think the opposite is rare.

Maybe things changed? I think most Windows developers say 10 or 15 years ago might not even know what Linux is, or have heard the word as some obscure technology used by few. In other words, I would have expected a Windows developer to respond to "doesn't compile on Linux" the same as they would for e.g. "doesn't compile on Plan 9", or whatever is still very obscure relative to Windows nowadays.


I used to post on test failures in Perl modules when building `perl` with MSVC on Windows. Just two[1] examples[2].

Also, there is this gem[3]:

> I still run into modules that try to create temporary files in the root directory of my C: drive. That usually happens due to the script clearing the environment and not saving temporary directory locations. This is an unfortunate interaction with File::Spec->tmpdir which defaults to trying to write to the root (hey, Windows 95 allowed it!) of the current drive if it can’t locate the customary directories. I think File::Spec->tmpdir ought to croak if the environment does not contain one of TMP, TEMP, or TMPDIR, instead of offering C:\system\temp or C:\temp or /tmp or / on Windows. Regardless of File::Spec’s behavior, scripts, modules, etc should not delete those environment variables.

[1]: https://www.nu42.com/2014/12/yeah-you-put-me-in-my-place-rea...

[2]: https://www.nu42.com/2014/11/fixing-hard-coded-file-path-in-...

[3]: https://www.nu42.com/2018/03/dont-complicate-things.html


They are used in CDE [1] and ET++ (an X11 toolkit that Erich Gamma worked on before Design Patterns).

[1] https://en.wikipedia.org/wiki/Common_Desktop_Environment


I've honestly never seen .C used in the wild. I imagine anyone whose written a bit of portable code would immediately realize this is a bad idea.


Haven't really seen it in the wild, but I see it every day in a very large proprietary codebase I work in.


I only know it as an extension for "interpreted C++ scripts" as used by CERN's ROOT. It is considered good practice to make them compilable but the interpreter used to be very lenient (before they integrated it with clang) so that usually didn't happen.

Anyway I would never rely on the C compiler invoking the C++ compiler; I always write g++ or $CXX. I wonder if there is a downside to that.


Just for an example, the OpenFOAM codebase has 'em.


This is because the Windows filesystems (in backwards compatibility with DOS) are not case sensitive, so the compiler doesn't distinguish between .C and .c


Nowadays, they’re case-preserving, so the compiler _could_ distinguish the two.

Historically, though, lowercase character weren’t allowed in file names (even though the on-disk format would have allowed it), so compilers couldn’t make the distinction. Given that using “.C” has fallen out of fashion (if it ever was fashionable), I don’t see much pressure to add that functionality, especially given that it might break compilation of C source code copied over from old times that uses .C.


I've had similar fun on macOS, where IIRC the project had both an API.h and an api.h file.


Is this something that happens because someone thinks C/C++ is one language? I often see this single mythical language referred to in job postings, blog posts from folks with a range of experience in the field, and even in neophytes who got terrible information from their Java instructor at university. But I don't think I've ever come across it in an actual codebase.

I have seen .cc for C++ and it annoys me, but seems rather common.


>> Is this something that happens because someone thinks C/C++ is one language?

No[1]:

> C++ source files conventionally use one of the suffixes `.C`, `.cc`, `.cpp`, `.CPP`, `.c++`, `.cp`, or `.cxx`; C++ header files often use `.hh`, `.hpp`, `.H`, or (for shared template code) `.tcc`; and preprocessed C++ files use the suffix `.ii`. GCC recognizes files with these names and compiles them as C++ programs even if you call the compiler the same way as for compiling C programs (usually with the name gcc).

[1] https://gcc.gnu.org/onlinedocs/gcc/Invoking-G_002b_002b.html


This doesn’t speak to why the problem occurs. You didn’t even answer the question you quoted.


I don't work in C or C++ but I'm inferring from the context and other comments that it's actually because .c (lower case) is used to indicate C language code and .C (upper case) means C++ code. This then fails on windows because the OS is not case-sensitive wrt filenames.

So to make the implied answer to the question explicit: no, it's because someone who knew the difference between C and C++ sought to distinguish between their source code files using a filename extension convention that becomes invisible on a different OS.


They are incredibly similar, though. And they inter-link. It's easier to have one program that contains C and C++ code than it is to have one program that contains Python 2 and Python 3 code.


> They are incredibly similar, though.

Not even close. C++ is a vastly more complex language with a very different philosophy from that of C.

You might argue that C can very roughly be treated as a subset of C++, but this really is a very rough approximation. Which is to say, really, that it's wrong.

> And they inter-link

It's generally very easy to call C code from C++ code, yes. It is not easy to call C++ code from C.


> C can very roughly be treated as a subset of C++

Well, 99% of the time this holds, and you can generally use the same tooling in different modes for both. And they share a preprocessor.

> It's generally very easy to call C code from C++ code, yes. It is not easy to call C++ code from C.

Significantly easier than almost any other pair of languages, though. Especially if you take a little care on the C++ side (extern "C") or use COM or similar.

Again, C is more compatible with C++ than python 2 is with python 3. But not the other way round.


> > It is not easy to call C++ code from C.

> Significantly easier than almost any other pair of languages, though.

I wouldn't say so. If you're using the features of C++ then you'd need to manually wrap it all to expose a C API.

Accessing C code from C++, Ada, Rust, Zig, Java, Python, C#, or just about any other language, would be easier than manually wrapping C++ to expose a C API.

(The LLVM compiler, written in C++, does this. It exposes a subset of its API as a C API. I get the impression it's no small task, or they'd expose the full LLVM library that way.)

> Especially if you take a little care on the C++ side (extern "C") or use COM or similar.

Agreed, but going the extern "C" route implies you're using C++ as a better C, rather than making full use of C++. I have to admit I don't know much about working with COM. It's vaguely like GObject, right? I imagine it must take quite a bit of work to expose a C++ API that way.


"99% of the time" may be stretching it a bit when you have to prevent yourself from using modern and useful C features in order to get a C++ compiler to accept your codebase.


> Is this something that happens because someone thinks C/C++ is one language?

Nobody thinks that. Everybody knows what it means. Picking on it is like picking on people referring to amd64 as x86.


Outside the world of people who do write C or C++, most people I talk to really do bucket the two together as if they were largely interchangeable or, at least, a lot more similar than they actually are.


Bucketing them together is fine. There are technical and organizational reasons for that. When you're working with C++, you're almost guaranteed to need to deal with C as well, so jobs for "C/C++ developers" make perfect sense. There is also a certain level of expectation that future versions of both languages keep incompatibility between one another to a bare minimum.

I'd happily apply for a "C/Rust" job as well.


Right I mean one project I was working on who had to write some JavaScript and would always refer to that language as Java. He was a developer, literally using one of the languages, so should have known better. And Java and JavaScript have less in common than C and C++.


People think that. amd64 and x86 today mean the same thing, C and C++ not quite. I've also seen HR people think that java and javascript are the same thing.


In the olden days around VS 6, Microsoft only had one "C/C++" compiler. Needless to say it not very standard compliant, but C was still mostly a C++ subset, and the differences were anyway smaller than the deviations of MSVC from the C++ standard. So it made sense to just compile everything as C++.


AFAIK microsoft has never had a C compiler. C is enough of a subset of C++ that you can use MSVC to build most programs, but MSVC doesn't officially support C to this day. C++ tries to bring in everything of the latest C standard, but sometimes that isn't possible, or C++ has a better way (better generally because of some issue that doesn't exist in C in the first place). The C standard committee is aware of C++, and tries not to break the ability of C++ to use new C standards, but nobody is perfect.


> AFAIK microsoft has never had a C compiler.

Well, MSVC compiles .c files as C, which I guess makes it also a C compiler? It supports most of C99 by now.

> MSVC doesn't officially support C to this day.

Oh, it does: https://docs.microsoft.com/en-us/cpp/build/walkthrough-compi...


They had a standalone C compiler for kernel drivers.


Is that something people do? I have never seen a C++ file with a .c extension. I have seen plenty of C++ headers with .h extension, though.


I think it meant uppercase .C instead of lowercase .c as a C++ extension. Indeed it's pretty rare.

That being said, there are indeed projects using lowercase .c extension for their "C++" file: gdb [1][2]

[1] https://github.com/bminor/binutils-gdb/blob/3e5fac07975a310c... [2] https://github.com/bminor/binutils-gdb/blob/master/gdbsuppor...


I see, thanks for clearing that up. (I have obviously also not seen the uppercase .C used before.)


gcc does as well.

The use of .c for a C++ file is probably a project that decided, decades later, to switch from C to C++ and didn't want to go through the hassle of renaming all of their source code files to reflect the language switch.


This is about C++ files with a .C extension, not a .c extension.


Semi-related: There is now a way to set a case-sensitive flag for NTFS directories `fsutil.exe file setCaseSensitiveInfo C:\folder enable`

Here’s an article about it: https://www.howtogeek.com/354220/how-to-enable-case-sensitiv...


I would suspect using that option would only open up a bunch of other unexpected issues, only because Windows programs have been written expecting that case insensitive file system.

Flipping that setting might make the file system case sensitive, but those Windows applications would still be working as if they were running in a case insensitive world.


Depends on the application I would think. If they do everything properly, they should let the file system handle it and not care themselves about case-sensitivity.


The problems start whenever a program asks the user for a file name and it then saves that raw users input.

For example lets imagine some sort of batch processing application that runs user defined scripts.

To define some batch process, the user provides the name the script to be run, so they enter 'MyScript.py' even though the file lives on disk as the 'myscript.py' file.

That batch processing application then saves the user supplied file name 'as is' and everything works fine in default Windows file system mode.

However, flipping that file system option to make it case sensitive suddenly the batch processing will fail.

As Windows applications don't have to deal with case sensitivity, they don't do things like check if the user has entered the name correctly in terms of case.

The just ask the file system if the file exists and likewise, Windows is not checking the case when it does that 'exists' check.


> The just ask the file system if the file exists and likewise, Windows is not checking the case when it does that 'exists' check.

I haven’t tried it, but I would assume that’s exactly what happens. The file system, with the case-sensitive flag, should return "File does not exist".


Yeah, I just tested this, works exactly as expected. After creating `Hallo.txt` and calling `File.Exists(@"C:\folder\hallo.txt")`, I get `false` with the attribute enabled, and `true` with it disabled


key being "properly". Many programs, especially games, fail under wine because of case sensitiveness and need workarounds.


That’s a different problem though. It sounds like that’s programs calling Name.xyz when the file is called name.xyz

That’s probably also the reason why there is no direct way to enables case-sensitivity recursively ;)


That was always possible using some obscure registry flag. Only people compiling unix projects used that in desperation.


I looked it up, assuming you are talking about the keys mentioned here [0], then that is very different as the `fsutils` commands actually makes the normal windows calls on NTFS behave with proper case insensitivity.

[0]: https://superuser.com/questions/266110/how-do-you-make-windo...


No. I was referring the cygwin posix=1 mount option, and the 2 FILE_CASE_ flags of GetVolumeInformation(), which a minority used.

Before XP flag 1 meant case-sensitive, now it means case-preservant and flag 2 handles case-sensitive search.

Stackoverflow talks about NFS and something completely different.


I had a fun time trying to figure out why a dependency project, my own, was not building via FetchProject in cmake. Macos by default, like windows is case preserving but insensitive too. My CMakeList.txt file was named CMakelist.txt and Linux Ext4 is case sensitive. Cmake really shouldnt care, or use all lowercase everywhere


IMO filesystems should treat different characters as different. Case-insensitivity at the storage level is always a mistake. It's occasionally fine for search or display, but it's a lossy operation. Sadly legacy compatibility has kept this choice on a number of filesystems.


Case insensitive filesystems are a horrible idea, because you have to take encodings into account and the Turkey test, which means that either your normalization is broken or something will be equal under en_US but not tr_TR.

Normalization should be done in the file manager or in a gui file picker, and not at the filesystem level.


Exactly. But Windows and MacOS were made with the US-centric view that letters of different case are identical (they're not, thankfully even ASCII didn't make that mistake), and now we're stuck with it due to large amounts of legacy software relying on that assumption.


I agree, but I think that the tooling should appreciate and handle the cases uniformly. On a case sensitive system, in the not found error handling, look for other names and warning that this might be the case.

Ideally for the cmake tool, the filename should not have anything but lowercase letters too.


I don't find any good reason to use .C instead of .cpp, .Cpp, .cxx, .Cxx, or similar. That two characters more of information helps compilers and other persons a lot.


I haven't seen a "big-C" file in years. Does any project that started after 1999 still use them? Anything I have ever read either uses .cpp, .cc, or more rarely .cxx.


It's mildly upsetting to me that the usual file extension for C++ isn't .c++, or for that matter .C++

Why shouldn't it be?


According to wikipedia "+" is a reserved (prohibited) character in old/classic FAT file names https://en.wikipedia.org/wiki/Filename#Filename_extensions

My recollection is that the first Unix C compilers (Cfront?) used ".cc" for c++ and Microsoft started using ".cpp" , probably because of the above restriction.


`.cpp` is very common. I never understood the point of `.C`; even ignoring issues with case sensitivity, it is too easy to visually confuse with `.c`, and doesn't even have much of a mnemonic connection to the name C++.


Probably because "+" is a special character in most shells. You'd have to quote the filename every time you did anything with it.

Edit: morning brain fart. Actually, I guess "+" is not all that special.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: