Hacker News new | past | comments | ask | show | jobs | submit | mzl's comments login

I tried Gemini Pro 2.5 with a lot of context: all the documentation for a system and several papers of interest, then asking it to use the system tom implement the proposed solution in the papers. The total context was over 500k words, so with usual estimates probably over 700k tokens.

The answers started out ok, but fairly quickly it seemed to loose track of the mid-stuff in the documentation, insisting on using one concept instead of another even when I explicitly told it not to. Full attention on 1M context is not really feasible (I don't believe that Google actually stores upwards of 1T of data just for my query), and there are various ways LLMs use selective attention. I'm not sure if Google has published anything on how they do it?


How would a typing system know if the right type is `int` or `Optional[int]` or `Union[int, str]` or something else? The only right thing is to type the argument as `Any` in the absence of a type declaration.

The typing system should use the most specific valid type, and the code author can broaden it with explicit typing if needed. No good typing system should ever infer a variable as `Any`: "I don’t know the type of this" (`unknown` in TypeScript) is not the same thing as "This function accepts anything". Conflating these things is one of the main reasons why Mypy is so annoying.

A typing system should only infer things that it knows are true, it should never invent restrictions. In a language like Python that is duck-typed, `Any` is the only reasonable choice in the absence of other real constraints like a type-annotation.

For a very long time, Scala was stuck at JDK 8 which hindered anyone that had a mix of Java and Scala code from upgrading.


From the linked web-page

    A cat written with UUOC might still be preferred for readability reasons, as reading a piped stream left-to-right might be easier to conceptualize.[14] Also, one wrong use of the redirection symbol > instead of < (often adjacent on keyboards) may permanently delete the content of a file, in other words clobbering, and one way to avoid this is to use cat with pipes.
I personally think the cat-style is easier to read since it only uses commands and pipes, with no need to keep track of redirection directions.


Before the launch, Spotify had a deal with the music rights holders association in Sweden (STIM) that they could use a merged collection of friends and families music libraries. All this was removed before Spotify went out of beta.

So while it was using pirated media, it was sanctioned by the rights holders for the experiment of building Spotify.


The standard heuristic any 2D packing algorithm starts with is left-bottom or bottom-left (the choice is symmetrical). Place a piece as much to the left as possible, and among equal choicse as low as possible (or vice versa, as the choice was in the linked article). From this base heuristic, further meta-heuristics can be developed with choices for the order to pack parts, how to choose among "almost equal" positions, choosing among rotations and symmetries, and so on.

As far as I can tell, the "Skyline algorithm" is bottom-left with the limitation that created holes can not be packed. This doesn't feel like a packing algorithm as much as it feels like a heuristic to speed up position finding.


I created a 3D bin packing at work with similar basic heuristic you described, plus some additional flavor for testing some number of different combinations during each iteration (and choosing the path that had best results up to that point).

The interesting thing I found is that the simple heuristics work pretty well. I actually had to dial back the level of optimization because it took too long for the human packers to duplicate the packing correctly to be able to close the carton.

I had to set it to a level that retained enough wasted space that allowed them quickly find their own solution, but knowing that for sure all of this stuff will fit into carton size X, because saving an extra few dollars in shipping didn't offset the labor dollars to pack it that way.


If one was sufficiently inspired by code A when writing code B, then it is a derivate work. This is a core tenet of copyright law.

At what measure is one sufficiently inspired for it to be a derivate work? That is up to courts to decide.


yeah the problem here is there is no 'code A' usually, it is more like: 1000s of GPLed code (A1, A2, ... An )

Technically when you get a piece from each, there is no infringement legally. ( as they have all different copyright holders )


From my understanding of a blog post by GitHub last year, they are planning to launch a tool to find similar code to what emitted by CoPilot, implying that CoPilot does not mix multiple sources for a single function, but derives a code block it found with a similar functionality (or maybe bigger blocks with similar functionality, IDK).

If CoPilot indeed derives a function (or a functional block) from a single source, it might plainly violate the license of the repository where it derives the code from.

There are many questions, and nothing is clear cut. The only thing I know is, I will never use that thing.

EDIT: I remembered that people were able to make CoPilot emit their code almost as-is with the correct prompts: https://x.com/docsparse/status/1581461734665367554

So it's not we're taking a bit from n different sources, and generate something with that.


> Technically when you get a piece from each, there is no infringement legally.

False in ex-Commonwealth countries and Japan.


Daylight savings time is not in a majority of countries. See the map on Wikipedia: https://en.wikipedia.org/wiki/Daylight_saving_time


Why haven't they caught up to the west yet? Why are they doing it wrong?


> academia is mostly about shitting out as many papers as you can

This is the classic case of publish-or-perish, since publication metrics are ubiquitous in all aspects of academic life unfortunately. Measuring true impact is the goal, but it is hard problem to solve for.

> and make them as verbose as possible

This is just laughably wrong. The page-limits are always too low to fit all the information one wants to include, so padding a text is just not of interest at all.

With that said, I wouldn't be surprised if people use ChatGPT a lot. If for no other reason most academics are writing in a language (English) that is not their native language and that is hard to do. Anything that makes the process of communicating ones results easier and more efficient is a good thing. Of course, it can also be used to create incomprehensible word salads, but I've seen a lot of those in the pre-LLM times as well.


Yes, and if you recreate parts of what you learned you might run into copyright issues. Too much inspiration from something you've studied, and it becomes a derivative work subject to all the regulations. And there are no clear and strict rules, it is always a judgement call.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: