Hacker News new | past | comments | ask | show | jobs | submit | ptx's comments login

Doesn't this violate the license of the license? The very first paragraph says that "changing it is not allowed" – or "no one can change it" in this changed version.

And calling it "GNU" and attributing it (only) to the Free Software Foundation, when they didn't publish this modified version, seems inappropriate. I don't know if the FSF have trademarks on these terms, but this is the kind of confusion trademarks are meant to protect against.

As a final nitpick, I believe (but I might be wrong about this) the phrase "General Public License" is intended to mean a license for the general public, so changing "General" to "Justified" wouldn't make sense in that case (as it would mean that the public is justified, not the license).

(Neat trick nevertheless.)

I believe you can change it as long as you change the name, which this thing does. (not sure, though, indeed the text doesn't seem to permit derivatives as is)

However, keeping GNU in the name might be problematic.

(and yes, I'm impressed that the justification never uses 2 consecutive spaces or other such tricks)

> However, keeping GNU in the name might be problematic.

Problematic?! I would imagine even an L1 could prove that the Justified version is still Not Unix.

Parody is fair use, and this is part of a submission to SIGBOVIK 2024. http://tom7.org/bovex/ It's also clearly not competing with the original, since its contents are partially nonsense.

How do you get a license for the LTSC?

I fake one with vlmcsd.

.NET Core would be great if it weren't tainted by the New Microsoft policy of turning all tools into spyware. Now you have to remember to set DOTNET_CLI_TELEMETRY_OPTOUT=1 everywhere and meticulously ensure that it doesn't get stripped from the environment when launching subprocesses, changing users, etc.

I remember the discussion about that which boiled down to: it’s good for you - shut up.

That was the thing that kicked me over the fence. Open source doesn’t matter if the stewards do not have your best interests at heart.

The list of what is gathered: https://learn.microsoft.com/en-us/dotnet/core/tools/telemetr...

As you can see, it is system data and basic crash reporting, something that helps with the health of .NET ecosystem and has no monetary value.

They used to gather all command line arguments until they later decided that (oops!) it's "not acceptable per our privacy policies"[0] and they really shouldn't have been doing that. They have also had issues with anonymization not being implemented properly, the opt-out mechanism not working in some edge cases, forgetting to even tell users about the need to opt out, and who knows what else. The risk of accidentally exfiltrating your data one way or another seems pretty high.

Also, monetary value is not the only reason you might want to keep information private.

[0] https://github.com/dotnet/sdk/issues/6145#issuecomment-22010...

The issue dates back to the release candidate of the second version of .NET Core, 8 years ago. Surely we can do better than "it used to be bad at x point in time and this can never be rectified".

I like my privacy more than the next guy, but you are not arguing in good faith (which is all too common, because people like to use much inferior technologies and make .NET a scapegoat, instead of learning better and ceasing idiotic self-sabotage)

Edit: Heh, in HN jail. In response to the comment - you should apply this logic to the languages steered by Google, Oracle and Apple. .NET is completely unrelated to Recall and whatever happens to it, it could have been as well a separate company.

Microsoft does not make it easy to give them the benefit of the doubt.

See, for example, the Recall debacle. https://doublepulsar.com/recall-stealing-everything-youve-ev...

I've had teachers who didn't understand the subject they were teaching. It's not a good experience and replicating that seems like a terrible idea.

A key advantage is that LLMs dont have emotional states that need to be managed.

The menu key was always useless anyway, because shift+F10 does the same thing.

I believe the windows key is just ctrl-esc

Doesn't work for the shortcut combinations.

But the equivalent pixel value depends on the root element font size, so the comment will be wrong when that changes. If you leave the math to the browser dev tools you'll get accurate results without any AI figuring out patterns.

Yep but in our workflow we’ve never deviated from 16px base. The comments in the code are purely to help when translating designs to rem in particular with tailwind.

Like XMLVM[0] with English instead of XML.

[0] http://www.xmlvm.org/overview/

How do the "special tokens" work? Is this a completely reliable mechanism for delimiting the different parts of the prompt?

Are they guaranteed to be distinct from anything that could occur in the prompt, something like JavaScript's Symbol?

Or are they strings that are pretty likely not to occur in the prompt, something like a MIME boundary?

Or are they literally the strings "<|start|>" etc. used to denote them in the spec?

they are "literally the strings" but I believe they will be escaped, or encoded differently, if a user tries to inject them as part of a prompt.

Yeah the tokens are more akin to JS Symbol.

If you're parsing untrusted user inputs into tokens, you can make sure your tokenizer will never produce the actual numbers corresponding to those tokens.

A simplified example: I can `.charCodeAt` a string all I want but I'll never get a negative number, so I can safely use -1 to mean something special in the transformed sequence of "tokens".

Do you have any publicly available code demonstrating this pattern?

I don't actually, but it can be explained in a few lines of code. Consider the following two simple functions:

    def ref(obj):
        return id(obj)

    def deref(addr):
        import ctypes
        return ctypes.cast(addr, ctypes.py_object).value
Basically, this relies on an implementation detail of `id()` in CPython: the unique id of an object is its memory address. `ref()` returns a reference to an object (think `&` in C), and `deref()` dereferences it back (think `*` in C). This is close to the standard `weakref` module in essence, but weakref is a black box.

Now even though the callstack is cleared upon fork of the worker processes, you still have the parent objects available, and properly tracked and refcounted, as you can check from `gc.get_objects()`. This is in fact a feature of `gc` as explained in the doc (https://docs.python.org/3/library/gc.html):

> If a process will fork() without exec(), avoiding unnecessary copy-on-write in child processes will maximize memory sharing and reduce overall memory usage. This requires both avoiding creation of freed “holes” in memory pages in the parent process and ensuring that GC collections in child processes won’t touch the gc_refs counter of long-lived objects originating in the parent process. To accomplish both, call gc.disable() early in the parent process, gc.freeze() right before fork(), and gc.enable() early in child processes.

Now whenever you want to send large objects to a `multiprocessing.Pool` or `concurrent.futures.ProcessPoolExecutor`, you can avoid expensive pickling by just sending these references.

    class BigObject: pass

    def child(rbo):
        bo = deref(rbo)
        return bo.compute_something()

    def parent():
        bo1 = BigObject()
        bo2 = BigObject()
        with Pool(2) as pool:
            result = pool.map(child, [ref(bo1), ref(bo2)])
In a real codebase though, there are some caveats around this. You cannot take the reference of just anything, there are temporaries, cached small integers, etc. You will need some form of higher level wrapper around `ref()` to properly choose when and what to reference or to copy.

Also it may be inconvenient to have your child functions explicitely dereference their parameters, it will force you to write _dereference wrappers_ around your original functions. A good strategy I've used is to create a proxy class that stores a reference and override `__getstate__`/`__setstate__` for pickling itself as reference and unpickling itself as a proxy. That way, you can transparently pass these proxies to your original functions without any modification.

Oh, I see. You want to avoid serializing the objects since they will be copied anyway with fork(), but the parent needs a way to refer to a particular object when talking to the child, so it needs to pass some kind of ID.

You could also do it without pointers and ctypes by using e.g. an array index as the ID:

    inherited_objects = []

    def ref(obj):
        object_id = len(inherited_objects)
        return object_id

    def deref(object_id):
        return inherited_objects[object_id]
Although this part needs a small change as well, so that the object ID is assigned before forking:

    def parent():
        bo1 = BigObject()
        bo2 = BigObject()
        refs = list(map(ref, [bo1, bo2]))
        with mp.Pool(2) as pool:
            result = pool.map(child, refs)

> You want to avoid serializing the objects since they will be copied anyway with fork(), but the parent needs a way to refer to a particular object when talking to the child, so it needs to pass some kind of ID.

Yes, that is exactly and succintely the crux of the idea :-)

As you found out, you can rely on indices or keys in a global object to achieve the same result. The annoying part though is that you need to pre-provision these objets before the pool, and clean them after to avoid keeping references to them. That means some explicit boilerplate every time you use a pool.

The nice thing with the id() trick is that it's very unintrusive for the caller, as the reference count stays the same in the parent process, it is only increased in the child, unbeknownst to the parent.

Not actually "in Excel", though. The Python code runs on Microsoft's servers (they say in the introduction) and Excel is just a client.

There's no reason they couldn't embed CPython in Excel, but maybe the intention was for the online version of Excel to have feature parity without having to compile Python to JavaScript?

the intention is to lock in orgs to their cloud services. This is a value-add. They really know that Excel, Word are "feature complete" and the only way they're going to make money on it is by harvesting and locking in the users.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact