Hacker News new | past | comments | ask | show | jobs | submit | chank's comments login

COVID really pumped up the market. Lots of hiring at insane TCO especially for unproven talent. Correction had to happen. As someone else said, if you have the experience and skills and fall into the comp. range companies are willing to pay you'll be fine. If you're missing any that it's going to be rough. My company is hiring like crazy, but only for very specific dev roles.


Answer is still no and still for the above reason. Compute resources are only relevant to how fast it can answer not the quality.


Then why does chain of thought work better than asking for short answers?


Because it’s a better prompt. Works better for people too.


That's not the only reason.

More tokens = more useful compute towards making a prediction. A query with more tokens before the question is literally giving the LLM more "thinking time"


It correlates but the intuition is a bit misleading. What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

It's why "RAG" techniques work, the models learn during training to make use of information in context.

At the core of self-attention is dot product measurement which causes the model to act like a search engine.

It's helpful to think about it in terms of search: the shape of the outputs look like conversation but were actually prompting the model to surface information from the QKV matrices internally.

Does it feel familiar? When we brainstorm we usually chart graphs of related concepts e.g. blueberry -> pie -> apple.


>What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

I'm not saying this isn't part of it but even if it's just dummy tokens without any new information, it works.

https://arxiv.org/abs/2310.02226


It’s not clear that more tokens are better.


I think it's pretty clear

https://arxiv.org/abs/2310.02226

I mean, i can imagine you wouldn't always need the extra compute.


This paper is a great illustration of how little is understood about this question. They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding. But in any case, this phenomenon has little to do with increasing the size of the prompt using meaningful tokens. We still have no clue if it helps or not.


I just found this paper i read a while ago. Doesn't this answer the question ?

The Impact of Reasoning Step Length on Large Language Models - https://arxiv.org/abs/2401.04925

>They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding.

More tokens is more compute time for the model to utilize, that is completely true.

What they guess is that the model can utilize the extra compute for better predictions even if there's no extra information to accompany this extra "thinking time".


Yes, more tokens means doing more compute, that much is true. The question is whether this extra compute helps or hurts. This question is yet to be answered, as far as I know. I tend to make my GPT-4 questions quite verbose, hoping it helps.

This is completely orthogonal to CoT, which is simply a better prompt - it probably causes some sort of better pattern matching (again very poorly understood).


>The question is whether this extra compute helps or hurts.

I've linked 2 papers now that show very clearly the extra compute helps. I honestly don't understand what else it is you're looking for.

>This is completely orthogonal to CoT, which is simply a better prompt - it probably causes some sort of better pattern matching (again very poorly understood).

That paper specifically dives in on the effect of the length of the CoT prompt. It makes little sense to say - "oh it's just the better prompt" when Cot prompts with more tokens perform better than the shorter ones even when the shorter ones contain the same information. There is also the clear correlation with task difficulty and length.


Yes, the CoT paper does provide some evidence that a more verbose prompt works better. Thank you for pointing me to it.

Though I still don’t quite understand what is going on in the dummy tokens paper - what is “computation width” and why would it provide any benefit?


So "compute" includes just having more data ... that can also be "ignored"/ "skipped" for whatever reasons (e.g. weights), ok.


I have a theory that the results are actually a side effect of having the information in a different area of the context block.

Models can be sensitive to the location of a needle in the haystack of its input block.

It's why there are models which are great at single turn conversation but can't hold a conversation past that without multi-turn training.

You can even corrupt the outputs by pushing past the number of turns / show the model data in a form it hasn't really seen before.


Models can be sensitive to the location of a needle in the haystack of its input block.

But only if we use some sort of attention optimization. For the quadratic attention algo it shouldn’t matter where the needle is, right?


It's not a loss if they're still net positive from their unrealized gains. It's basically the home ownershipt version of "I know what I've got" but they in fact don't.


> I’ve heard similar tactics being used at other companies–mostly large companies–and it’ll only continue in 2024 as they make decisions that drive short term profits over all else.

When you tie leadership incentives to short-term profits, that's the only type of decision making that will be done.


How can Amazon be guilty of incentivizing short term profit when their profit margin history looks like this?

https://www.macrotrends.net/stocks/charts/AMZN/amazon/profit...

Compare to Alphabet/Microsoft/Apple/Meta’s 20%+ profit margins.


* https://lmstudio.ai

LM Studio is far superior these days.


LM Studio uses llama.cpp under the hood, so if you don't need a fancy UI, you are probably better of running that.


Absolutely hilarious exchange that I hope highlights the truth of the original comment!


Is there any well known instance where a private equity firm taking over a business/service resulted in a net positive for customers and not just the firm being a leech?


I would add another caveat: "and for the employees" as one of the key plays by PE seems to be to just lay as many people off as they possibly can get away with.


Dell went private for a bit so they could pivot without having Wall St complain about a few unprofitable quarters.


How and who is effected.


> If a bunch of firms use the same software to set prices but do so independently and without any attempt to coordinate with each other… that’s a pretty decent defense?

The software enabled coordination without rental management companies/owners needing to put in effort to do so. In the end all they cared about was maxamizing profits which in itself isn't illegal no matter how unethical it might be. The key here is proving that they knew what was going on. I'm sure they did all things considered. It's been going on for years.


If prices went up and occupancy went down, it would be difficult to claim otherwise. Why let units sit empty?


The argument here (not saying I agree with it) is that it's common to let units sit empty over particular periods, based on other business decisions and an understanding of the market.

eg, If you have a lease that ends in September, the make ready process takes 2-4 weeks, so the unit will be available for lease again in October. There's not a high churn of residents in October-December due to holiday periods, etc. and demand doesn't pick up again until January. Therefore, it's better for you to let that unit sit empty now, because demand is so much lower for the next 2-3 months you won't be able to get the same increase in lease price as you will when demand skyrockets in January.

Another example is an understanding that some kind of amenity will change over a period. eg, a new public transport route is going in, or the property next door is being rezoned and demolished, from an industrial site to a commercial precinct, and so waiting for that to happen means that you can get a higher price on your lease once that happens, to the extent that it will offset the loss from leaving the unit vacant.

The defendants will argue that the analysis provided by the plaintiff doesn't consider these kinds of business decisions and that this part of the argument should be considered flawed.


Vacancy rates nationwide are near all time lows. I’d need more evidence to accept that occupancy went down or landlords left units empty.


No, you're an art "producer" in the same fashion of music producers who are not musicians, but still create music.


Honest question: Where is the line drawn between music producers and musicians? Playing instruments? Using samples?


Areas of responsibility. The producer is the person ultimately responsible for the output. You can be a producer and not know a lick about good musicianship or play an instrument, but you have good taste and can use tools (samples, software, etc) to take what others have done to make something others will like/use/pay for/etc.


If you can put together (tasteful) songs by using tools like samples and software, aren't you a musician then? Many would consider tools like DAWs instruments, so mastering one of those would be considered playing an instrument, at least for those people.


> Many would consider tools like DAWs instruments...

People can stretch the definition however they like since there is no hard acceptance of where the line is drawn. Sure the general term musician means a person who plays/creates music. A person who hums can be a musician.

However in my own opinion, I disagree. DAWs are not instruments in themselves. The main functionality of a DAW is in recording, arranging, and mixing audio. They are the digital equivilent of a tape recorder and manual cut/splice process. They have evolved to include many software add-on emulations of instruments and packages around the production process: synths, pianos, mastering tools, additional recording tools, etc. Using a DAW does not make you a musician, but using provided instrumentation in a DAW can make you become a musician.


You can be a software engineer and not know a lick about syntax or write algorithms, but you have good intuition and can use tools (LLMs, autocomplete, etc) to take what others have done to make something others will like/use/pay for/etc.


Why use tools? Just be a product manager and you'd attain the same result.


Times I've been to low/no tipping countries; service was literally idistinquishable from what I experience in the U.S.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: