

C++ Filesystem Technical Specification approved by ISO - adamnemecek
http://article.gmane.org/gmane.comp.lib.boost.devel/256220

======
huhtenberg
As someone who writes portable high-performance C++ code for living I see this
as no more than a great prototyping aid.

Adding a non-trivial system abstraction to your code - be it a file system or
a socket/networking API - is always a tricky task, even for the simplest of
requirements. For example, Windows API is inherently UTF16, so if you want
good performance, you need to keep all your filesys references in wstirngs.
But not on Windows, you'll need strings instead. If you are to abstract this,
you have a choice of either using an opaque "filesystem_string" type that maps
to wstring/string (with an API for cross-conversion) or defining, say, an UTF8
API and then doing a boatload of run-time Unicode conversion on Windows. The
fun part is that while this makes the filesystem code more compact, it has a
negative effect on the rest of the application. The app will either need to
deal with conversions to/from "filesystem_string" when serializing/displaying
them or the abstraction layer will be busy converting utf8's to the native
string format. Either way you pay with performance for "portability".

This doesn't apply to just this new C++ TS. It's the same issue with the
existing standard POSIX API as well (fopen/fread/etc). It's great for hacking
together something that builds on Linux, BSD and Windows, but you'd typically
want to put in your own system interface before shipping production version.

~~~
roel_v
" so if you want good performance, you need to keep all your filesys
references in wstirngs."

Why? For path manipulation, you can still use whatever format internally that
you want. Only just before the native API calls you'll need to convert. I'm
open to many arguments on performance, but I have a hard time believing that
this conversion would matter (or even be measurable) relative to the time the
actual I/O will take.

Filesystem is a great addition to the C++ standard library. If you're
seriously claiming that filesystem will not be suitable for production code, I
can't help but put you in the (hypothetical) group of people who say 'the
standard allocator is only a great prototyping aid, you'd typically want to
put in your own allocator before shipping production version.'. Uh no,
millions of application ship just fine with the standard allocator, just like
thousands (or maybe even hundreds of thousands) of applications are already
shipping just fine with boost filesystem.

~~~
MaulingMonkey
>> " so if you want good performance, you need to keep all your filesys
references in wstirngs."

> Why?

I'd take this even further: The less (re)interpretation the better. Will my
UTF16 <-> UTF8 conversion roundtrip my malformed UTF16 codepoints? If not,
I've just lost compatibility with (admittedly buggy) software. But I've seen
bugs from far tamer, such as from the simple act of converting from a fixed
width wchar_t[] that was observed to be a NUL terminated UTF16 string to
std::wstring. It turned out that under certain circumstances that couldn't be
tested, it was an opaque binary blob with NULs in the middle... and the bit
after the NULs being important. Whoops!

Less relevant here though: std::experimental::filesystem::v1::path::value_type
looks to match the native OS character type. But then again, is that unichar
(16-bit) or wchar_t (32-bit) on iOS? Fufufufu~

It's also just plain _easier_ when you're dealing with native filesystem APIs
for whatever reason to work in the native character type, which is the main
reason I'd prefer it.

> but I have a hard time believing that this conversion would matter (or even
> be measurable) relative to the time the actual I/O will take.

RE: Performance: If you're doing actual I/O, agreed. If you're dealing with OS
or application cached filesystem entries, it becomes more believable. If
you're enumerating over and only conditionally performing I/O over hundreds of
thousands of paths - you'll want to be quite cautious of conversions if you
care about performance (potentially see: directory enumeration via backups,
stale file scans during builds, or comparing file lists between VCS
revisions...) It drives me absolutely insane when a 'nothing changed' build
takes 10, 20 seconds just from filesystem scanning. And I've seen and fixed
far worse (think O(n*n) scans -_-;;). As others have noted though, if you keep
your conversions 'reasonable' you should be able to get away with UTF8 paths
from a performance perspective.

> Filesystem is a great addition to the C++ standard library.

Agreed.

~~~
kuschku
Also another problem:

UCS-2, commonly used as implementation for UTF-16, doesn’t support all the
characters that UTF-8 does, which can lead to other nice problems during
conversion.

~~~
kevin_thibedeau
UTF-16 covers the full Unicode space just like UTF-8. You're thinking of UCS-2
(limited to plane 0) as implemented by Windows.

~~~
kuschku
Which can create nice issues, especially because stuff like Java is limited to
UCS-2...

~~~
vardump
> Which can create nice issues, especially because stuff like Java is limited
> to UCS-2...

Not true. Java uses UTF-16, not UCS-2. There are no encoding representation
issues.

~~~
heinrich5991
I believe strings in Java are __not __checked for UTF-16, but are just arrays
of 16-bit integers and as such used as UCS-2 by most Java code.

~~~
jerven
Most code written in the US/UK does not care about multiple codepoint used to
represent one glyph. But that does not change the fact that the String
representation inside Java are actually encoded in UFT-16 since Java1.5 (i.e.
since 2004).

Actually dealing with multi codepoint characters is hard in any programming
language and most people ignore it as often GUI stuff already takes care of
it.

By the way any string encoded in UTF-16 is an array of 16bit unsigned
integers. It just has variable length encoding which UCS-2 did not have. Just
like UTF-8 is an array of bytes just like 8bit ASCII.

------
rikkus
I like to be able to inject a 'filesystem' object into code for testing, so as
not to have to use a real filesystem. This doesn't look like it supports that,
which is a shame.

My other tactic is to create a temp directory at the start of a test and do
_everything_ within it, then clean it up at the end.

Any other ideas on how to write testable code which uses this filesystem API?

~~~
stinos
_Any other ideas on how to write testable code which uses this filesystem
API?_

not nice ones (or I must be missing something). Could either make all code
take the API as a template parameter, or wrap the API in your own set of
classes. Now often the logic you really want to (unit) test isn't the one
directly accessing the filesystem but rather a layer higher? And in case of
integration/system tests (or whatever it's called these days) wouldn't you
rather use the actual filesystem, like with the temp directory you talk about?
Can you provide a concrete example of what would be hard to test with this?

~~~
rikkus
It's third party code I worry about, where it hasn't been written to expect a
root directory. I can chroot on *NIX, I suppose.

Perhaps I should go and have a look at how code using boost normally deals
with dependency injection for testing. Maybe I'm missing something.

------
ridiculous_fish
It looks like most functions have two variants that differ in how they handle
errors. One throws exceptions, and the other returns an error_code.

It's nice to have non-throwing variant, although the rationale is suspect: it
asserts that the non-throwing variants are for "when file system errors are
routine," but many large C++ codebases deliberately disable exceptions
altogether: Google, Mozilla, LLVM, etc.

Is there precedent for this two-variant approach in other APIs that are part
of the C++ standard?

Oh, and this just seems wrong:

> Otherwise, clear() is called on the error_code& argument

Why clear it on success, instead of leaving it alone (like errno)? This makes
it harder to sequence calls and then check the error code at the end.

~~~
beached_whale
new/delete have done this for a while.

I wonder if this is because it is the boost filesystem v3 library mostly.

------
Nitramp
From glossing over the spec, it seems as if this does not include a file
system abstraction. Operations are based on `path` objects; there's no
additional state or context that'd allow e.g. implementing a virtual in memory
file system without support from the operating system, or an additional user
context to be used for a networked file system that needs user credentials,
etc.

In effect this looks a lot like Java's java.io.File (modulo some of the design
warts in that), which in Java land people are trying to replace with
FileSystem and Path
([http://docs.oracle.com/javase/7/docs/api/java/nio/file/FileS...](http://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html)).
Java's Path is specific to a file system, so that users can implement their
own FileSystemProvider
([http://docs.oracle.com/javase/7/docs/api/java/nio/file/spi/F...](http://docs.oracle.com/javase/7/docs/api/java/nio/file/spi/FileSystemProvider.html)).

I wonder why they went that way? Or am I missing something?

~~~
roel_v
The scope is just to provide a way to write cross-platform path manipulation
code. C++ is a much more stable (for better or for worse) language than Java
is. Filesystem has been developed for 10+ years, and now that the current
version works well, it's promoted to the standard. Whoever will want to write
virtual file systems isn't going to bother with std::filesystem anyway.

~~~
sedeki
Newbie here. Just curious, why wouldn't someone use std::filesystem to write
virtual file systems?

~~~
roel_v
Not sure I understand the question - std::filesystem is 'just' a (mostly)
cross-platform abstraction of files and directories, it doesn't know or deal
with the concept of 'filesystem'. (maybe the problem is in terminology -
'filesystem' in std::filesystem doesn't have anything to do with 'filesystem'
as in ext4, zfs etc.)

------
cokernel_hacker
By being standardized, it is opening itself up to being the defacto file-
system API of-choice for C++ programmers.

This wouldn't be a bad thing if the API wasn't broken by design. It is opening
itself up to 'time of check to time of use' bugs because it is completely
oriented around paths.

I can't believe that this was approved.

I was a professional file-system hacker until quite recently and this API
seems like exactly the wrong thing.

------
toolslive
So how does this compare to boost::filesystem ?

~~~
ksherlock
it is boost::filesystem.

------
gp7
The one boost error I've ever had come out of a program I was from
boost::filesystem, but to be fair, I don't think the difference between
Windows' "mklink /J" and "mklink /D" is documented properly anywhere

~~~
ianhedoesit
The major difference I've found is that /d is more akin to *nix symlinks with
few exceptions, and /j is an NTFS-specific, local-only, directory link. Links
made with the /d option can be files or directories to local or remote file
system.[1] Junction points can only point to local directories, and have a few
other limitations in regards to the Windows startup process.[2] As far as I
can tell, symbolic links created with mklink /d are just improved versions of
NTFS junction points.

[1]
[http://en.wikipedia.org/wiki/Symbolic_link#Microsoft_Windows](http://en.wikipedia.org/wiki/Symbolic_link#Microsoft_Windows)

[2]
[http://en.wikipedia.org/wiki/NTFS_junction_point](http://en.wikipedia.org/wiki/NTFS_junction_point)

------
ape4
So what's new here? Of course C already has fopen(), chmod() and readdir().
C++ already has fstream.

~~~
etimberg
This is about things like path resolution, directory permissions, file
permissions, directory iterators, etc.

