
Tar and Curl Come to Windows - tdurden
https://blogs.technet.microsoft.com/virtualization/2017/12/19/tar-and-curl-come-to-windows/
======
derekp7
I'm looking for a Windows tar implementation that can save/restore Windows
ACLs/DACLs and/or other extended attributes. Was thinking of taking the Red
Hat GNU tar patches that handle Linux xattrs and SElinux attributes, and
adapting the same technique for Windows, but haven't learned enough of the
Windows API yet to make it happen. As for getting that info stored into the
tar file, I believe this can be done by using PAX tar format (which can take
any arbitrary name/value pairs without losing tar compatibility).

Specifically, I need this functionality to make my backup program (Snebu) more
useful to Windows users, as it relies on an installed tar implementation on a
client to gather files.

If anyone here is skilled in the Windows API and is interested in helping with
this, let me know.

~~~
eps
You want NtQuerySecurityObject to get a binary SD blob, convert it to a SDDL
(text) format using respective API and store the result.

~~~
Const-me
That thing’s unsupported for user mode apps. The correct API is
GetSecurityInfo().

The main problem with backup/restore these things is SIDs. Windows ACLs are
much more sophisticated then these 6 bits in Linux.

It will work in some cases, e.g. when the owners are, and all the permissions
are granted/denied to, a well known (1) group such as "Everyone". Or when all
these SIDs are domain users SIDs, and you’re restoring to a PC that’s on the
same domain.

It won’t work in many other cases, e.g. when these permissions were granted to
some local user and you’ve reinstalled Windows between backup & restore.

That’s why in Windows it is typically considered OK to not bother backing
up/restoring these ACLs.

(1) [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa379649\(v=vs.85\).aspx)

~~~
derekp7
Thanks, I will read up on that. The problem I'm trying to solve, is I tried
restoring c:\Users from a tar backup backup, and it took quite a while of
clicking to get all the permissions sorted out.

As for "permissions were granted to some local user", in the tar file format
it stores a UID/GID number, as well as a username/groupname, so when restoring
to a different system the username takes precedence over the UID number. Could
the same idea work on Windows? (Granted, I'd probably have to store this in
the PAX header's name/value pairs, due to potentially longer length of the UID
numbers).

The other item I've read is that Windows file permissions can be inherited
from parent objects, so it may be useful to store those permissions, but not
restore them (or give a warning if the effective permissions end up being
different due to restoring to a different location).

~~~
Const-me
GetSecurityInfo() will return you the effective set of permissions/owners.
Good enough for a single file, inefficient for the complete hierarchy.

To backup the complete C:\Users directory to be restored later on the same PC,
I’d use specially-designed APIs for that, BackupRead/BackupWrite.

See this article for an overview: [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa362520\(v=vs.85\).aspx)

That article was written in the assumption that you’ll be using a real
hardware tape for backups, but you can backup to any other place.

This will handle file data (including alternative NTFS streams) and file
security, but you need to handle file names, file attributes, and timestamps
somewhere outside of that. And TAR format can’t quite fit all that info. File
names in TAR are limited to 100 characters, there’s only 1 timestamp in TAR
but NTFS has 3, resolution of TAR timestamp is 1 second but NTFS keeps 100
nanoseconds, finally there’s no place to keep file attributes. I wouldn’t pick
TAR format for that kind of backups.

Also note that NTFS hard links and EFS encrypted files both need special care
to backup.

~~~
derekp7
Thanks! I'll check that one out too. Note, I'm planning on using PAX
extensions for tar, so I can store an unlimited amount of data name/value
pairs per file, and also handle unlimited length filenames.

~~~
Const-me
Yeah, PAX will probably work. But IMO it brings questionable value here.

You’ll surely be able to create a PAX-based format that your own tools will be
able to backup and restore correctly. But it won’t be compatible, i.e. no
other tool will be able to restore or read the data.

BackupRead API returns zero or more streams, each prefixed with a variable-
length WIN32_STREAM_ID structure. For normal files without security
descriptors, you could just unwrap the content of the BACKUP_DATA stream to
the file content inside the TAR and you’re good.

But even if you’ll find a way to store BACKUP_SECURITY_DATA stream in these
PAX extended names-values, there’s other stuff to keep somewhere. Alternative
data streams, usually (when written by windows explorer) they’re quite small,
dozens of bytes, but nothing prevents them from being gigabytes, I’m not sure
PAX will be happy about values that large. And also there’re sparse files,
BackupRead won’t return you these zeroes like ReadFile, it’ll return you a
BACKUP_SPARSE_BLOCK stream instead, with just the non-zero portions of the
file.

If you won’t bother parsing the backup data and unwrapping the streams, it’ll
work for your tool, but your TARs will be incompatible with standard TAR tools
which won’t see the content of your backup.

~~~
derekp7
What about storing each stream as a separate logical file in the tar file? So
you have the main file, with a property of "nt_stream_count=2", and the file
name for each stream would be the main filename followed by ":stream_id". That
way each stream would show up as a separate file upon restore (assuming a
standard tar is used for restoring), but the custom tar would know what to do.

Also, unrecognized pax data would be ignored by a regular tar.

And in my use case, when data is submitted by the client to the Snebu backend,
it has its own tar implementation built in for separating file data from
metadata, and re-synthesizes a tar file upon restore. (The reason that Snebu
uses tar format for transfering, is it was a good way to make it agentless for
normal Unix/Linux servers, yet extensible enough to support other use cases.
Also I figured that most tar implementations were fairly well optimized, at
least more so then I could do in a short amount of time).

Thanks for your input, really appreciate it. Been a little bit tough learning
the WIN APIs, after a couple decades lost in Unix land.

~~~
Const-me
> What about storing each stream as a separate logical file in the tar file?

I think that’ll work. Just be sure to name these separate streams in a way so
they never conflict with other files that might be in the same directory or
with other streams that may be in the same file. E.g. in Windows, there’re
characters forbidden in file names which may work fine in Linux:
[https://stackoverflow.com/a/31976060/126995](https://stackoverflow.com/a/31976060/126995)

> with a property of "nt_stream_count=2"

Read this: [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa362667%28v=vs.85%29.aspx)

I don’t think just count is enough. Alternate data streams have names. Other
data streams, extended attribute, security descriptor, etc., are essentially
unnamed.

You can implement some naming schemes for them, e.g. “file” for the main data,
“file:xxx” for “xxx” alternative stream of the file, “file?sd” for security
descriptor of the file, “file?sparse” for sparse file data. This can result in
TAR backups that are more or less readable by the standard tools, but still
allow your own tool to restore the complete thing.

You have two problems to solve.

1\. Directories. Quite often, they contain at least security descriptors. But
AFAIK directories just aren’t stored in these TARs.

2\. File names and their encoding. AFAIK modern Linux is UTF8 all the way
down. Probably TAR is OK with that. But if UTF8 representation of the UCS2
name (WinNT isn’t quite UTF16, UCS2 is a subset of that) is longer than 100
bytes, you’ll have multiple files in your TAR with the same name, they’ll only
differ by your extended PAX attributes that contain the complete name. If you
combine that with the above fake names with security info & alternate streams,
it becomes even more complex.

P.S. I’ve been programming for windows for decades, only occasionally for
linux or other platforms.

------
amckinlay
Excellent. But we could have had this years ago had Microsoft maintained an up
to date and standards-compliant C library instead of promoting their insular
and proprietary "visual" C++ ecosystem. In fact, there probably could be a
whole host of GNU/Linux software running on Windows. But no, we can't even get
a complete C99 library.

~~~
slededit
Microsoft moved to C++ a decade ago, the C compiler exists only for backwards
compatibility and generates substandard code vs the same file compiled through
the C++ compiler. C features are only supported as required by the C++
standard, although I personally miss VLAs.

~~~
pritambaral
C and C++ are not alternatives, so a platform "moving to" one doesn't mean it
has to forego the other. Linux, the BSDs, macOS all have both C and C++
toolchains.

~~~
slededit
I'm just explaining what they did. They obviously could have continued to
support C if they wanted. But if your waiting for C99 support your wait is in
vain.

These days you can use clang so a nice C compiler has returned.

------
smilekzs
> Developers! Developers! Developers!

A laughing stock turned into a respectable trend. Well done.

I think with this, PowerShell needs to stop aliasing curl to Invoke-WebRequest
by default (or at least try to deprecate that alias).

~~~
WorldMaker
The comments to the article address the PowerShell alias: it sounds like
PowerShell is hesitant to change that alias and break the users that do expect
curl to alias Invoke-WebRequest, so the suggestion was to get into the habit
of using `curl.exe` as your invoke in PowerShell for real curl.

A deprecation warning would be a good idea, though, to suggest to developers
that are using the curl alias to move to iwr.

------
sonofgod
Wondering how long it'll be before we have `curl ... | cmd` as standard
installation instructions on Windows.

(Probably a while: command line familiarity is far from a given in Windows)

~~~
13of40
Using PowerShell to download and execute a scriptblock is a common malware
technique right now, but it's not something any sane software vendor would
recommend for installing an application.

~~~
freeone3000
That's what the Linux folks thought, and here we are.

------
JepZ
:D Sounds like there will be no Windows 11 but instead

"GNU Windows" ;-)

~~~
ciupicri
Though bsdtar and curl don't have a GPL license.

~~~
jwilk
I'd rather say: bsdtar and curl are not GNU software.

(Not all GNU software is under GPL.)

------
xellisx
If they would add SSH drive mapping...

------
qplex
What is this ad campaign Microsoft is pulling?

Porting simple UNIX tools from the 70-80s to Windows and making a big fuzz
about it? Tar? unzip.exe or whatever equivalent with a gui has been around for
ages.

I just can't see how any self-respecting developer would be lured to Windows
by this...

------
rijoja
That's nice but why is the example with zip files?

------
interfixus
The ancients knew this technology. Around 1990, tar.exe came bundled into some
OEM edition(s?) of MS-DOS, I don't specifically remember which.

------
timeimp
Who is this new Microsoft?!

~~~
kylek
New?

[https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...](https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish)

------
ISL
Still no X, correct?

~~~
Kipters
I don't think it would give them strategical advantage

~~~
ISL
It would allow me to function normally on a windows box. I can apt-get install
everything I need except anything that depends on X.

I'd still rather run GNU/Linux from the metal up, but my work depends on
Windows for legacy data-acquisition systems. Right now, I have an experiment
that's running a linux box and windows box side by side in order to get the
best of both. Working out hardware access in a VM sounds unpleasant, so I'd
love to see the native-windows-GNU implementation extended to X.

~~~
saulrh
For the five or six years my daily driver has run windows (gaming) while
actual work gets done by forwarding X from a headless linux box. In my
experience:

* MobaXTerm was usable (on Win7-8, a couple years ago), but IIRC it's surrounded by a faint stench of upsell and I recall being annoyed by its attempt to be a full desktop environment (tabs, file manager, integrated editor, etc). It feels very... windows-y, the X server equivalent of PuTTY.

* VcXsrv had (on Win10, about two weeks ago) a show-stopper bug - windows would fail to redraw outside their original bounds when resized. It also had trouble using modern UI toolkits or themes.

* Cygwin/X is the best, no issues at all if you're in cygwin, but I was unable to get it to cooperate smoothly with bash on windows and ended up deciding that the environment was worth more than the tiny missing features. If you don't need WSL I'd recommend this.

* Xming is what I've settled on. I use the free version, which is a major version behind (6.9 instead of 7.7), but I haven't noticed it lacking anything compared to Cygwin/X. Seems to be just as reliable as Cygwin/X too. Getting it configured was somewhat annoying (had to create xauthority manually, etc), but now that it's set up it's working smoothly.

I am mostly annoyed by two things, both of which are common to the setup and
not the chosen tools:

* I have to put up with Windows for window management. I would much prefer something like i3, but I haven't gotten the fullscreen mode to work to my satisfaction.

* Network issues. You _must_ be wired and you _must_ have spare bandwidth. If you're running gigabit ethernet this isn't really an issue, but if you try to do it over wifi you're going to have a bad time.

Compared to running a virtual machine... it really comes down to picking a set
of annoyances. VM you have to deal with configurating another machine, it
devours resources, integration with the host isn't as good, and it can't
tolerate monitor switching at _all_. Native X you don't get a usable WM.
_shrug_

~~~
derekp7
An issue I've had with Xming and Cygwin, is if I rdp to my windows desktop
from my laptop (potentially caused by using a different screen resolution /
color depth), sometimes the X server will crash when I go back to my full
desktop.

------
a3n
Cygwin.

------
stevemk14ebr
cool. I want wget now

~~~
freeone3000
Invoke-WebRequest is good enough for a lot of use cases. Curl should handle
the rest.

~~~
johnramsden
Got to love their naming scheme, _Invoke-WebRequest url_ just flows off your
fingers.

~~~
lstamour
I think it can be shortened to “iwr” (it’s aliased to “curl”)

~~~
WorldMaker
It's also aliased to wget.

The Verb-Noun verbosity of PowerShell is great for A) discovering new verb and
noun combinations that you hadn't considered before (and Get-Verb gives you a
list of common verbs), and B) for keeping scripts readable in the long term.
Most every Verb-Noun has at least one shortcut alias, most based on common
cmd/bash-isms (Get-ChildItem has gci, ls, and dir), and Get-Help on any Verb-
Noun will list the aliases for you.

~~~
WorldMaker
With respect to (B): I do try to write out the full verbose names (Verb-Noun)
when writing scripts, as a maintenance goal. At that point when you are
writing a script the verbosity is often less of a problem because you can
write in an IDE of your choice that provides strong auto-completion. (You can
also get basic tab completion in the command prompt and use an IDE as your
command prompt, if for some reason you are entirely allergic to the short
aliases and want to spell out the full things even for random REPL work.)

------
mayli
So, you have curl and tar, but no pipe (|) on windows.

~~~
Sophira
Pipes have existed in DOS since before Windows 95 was around!

~~~
Narishma
Since DOS 2.0, IIRC.

~~~
emmelaich
But not really. Using pipe means it dumps the _entire_ output to a temporary
file then feeds that temp file to the next program.

I'm not sure if Windows still does this. Probably does.

