Hacker Newsnew | past | comments | ask | show | jobs | submit | lispitillo's commentslogin

If you want a one line code, in J, for the 42 solutions:

   _ 11 11 #:,I. 0=,a+/,+/~  a=: 2 4 1 3 8 7 _3 _1 12 _5 _8
Or the 8 solutions in a 2x12 matrix:

   2 12 $, ~. (/:~)"1 ({&a)  _ 11 11 #:,I. 0=,a+/,+/~  a=: 2   4 1 3 8 7 _3 _1 12 _5 _8

   _3 1 2 _5  2 3 _1 _1 2 _8  4 4
   _5 1 4 _3 _1 4 _8  1 7 _5 _3 8


I hope/fear this HRM model is going to be merged with MoE very soon. Given the huge economic pressure to develop powerful LLMs I think this can be done in just a month.

The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs.

I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper.

I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change.

Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story.


This HRM is essentially purpose-designed for solving puzzles with a small number of rules interacting in complex ways. Because the number of rules is small, a small model can learn them. Because the model is small, it can be run many times in a loop to resolve all interactions.

In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results.

But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month.

Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task.


> In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other

A person has some ~10k word vocabulary, with words fitting specific places in a really small set of rules. All combined, we probably have something on the order of a few million rules in a language.

What, yes, is larger than the thing in this paper can handle. But is nowhere near as large as a problem that should require something the size of a modern LLM to handle. So it's well worth it to try to enlarge models with other architectures, try hybrid models (note that this one is necessarily hybrid already), and explore every other possibility out there.


What about many small HRM models that solve conceptually distinct subtasks as determined and routed to by a master model who then analyzes and aggregates the outputs, with all of that learned during training.


must say I am suspicious in this regard, as they don't show applications other than a Sudoku solver and don't discuss downsides.


and the training was only on Sudoku. Which means they need to train a small model for every problem that currently exists.

Back to ML models?


I would assuming that training a LLM would be unfeasible for a small research lab, so isn't tackling small problems like this unavoidable? Given that current LLMs have clear limitations, I can't think of anything better than developing beter architectures on small test cases, then a company can try scaling it later.


Not only on Sudoku, there is also maze solving and ARC-AGI.


This piece sounds like a second order LLM, the first-order one generates text and the second-order one filters and optimizes it. But, in fact, this work is done by a human being.

The human reviewer asks and wants you to proof-read your output before submission. It claims being able to detect any AI slop. I wonder whether this is true, and if so, for how long. Maybe it will be replaced by a GAN LLM, and then the loop will be closed.


Why do I program? I program as a hobby but I am always looking for an idea or concept that can be framed into programs so I can obtain wealth.

The TFA claims "Sometimes the hardest part is maintaining focus and not chasing every shiny new thing", and I agree.

I think you have to go beyond programming, since programming is just a tool for a higher order concept. For example design a solution to a problem.

But I haven't find the way, yet.


This is the most common mistake engineers make. Code is not worth anything. Solving a user's problem, which they're willing to pay for (not just any problem), is what can be converted to wealth. The intersection of these 2 is very small, and very dense - since all engineers aim for it.

If you venture out of that region and try to discover and solve problems (and if needed use code/automation/tech), you have a surer chance of generating wealth.


> The TFA claims "Sometimes the hardest part is maintaining focus and not chasing every shiny new thing", and I agree.

In a logical world yes, but often the majority of jobs want people who have experience with the shiny new thing.


Sorry, but I don't get the meaning of your phrasing. I think that to use AI you must be very explicit and clear about what you want to design, and if Lisp provides some advantages one should define accurately the specific tool to use and when, how, and why.

I recall Norvig mentioning that other computer languages have taken many ideas from Lisp, those languages are also in the new civilization. Just to give an example: destructing-bind, apply and others are now done in javascript with a shorter syntax, and javascript without macros has excellent speed.


The quoted portion is a reference to an XKCD strip from earlier this century, https://xkcd.com/297/, which is a reference to Star Wars.

Their use of L1 and L2 should be read as "L" as "level" L1 is lower level, L2 is higher level. They're suggesting using Ada (or some other well-suited language) for the lower level trusted systems language and Lisp for the application language.

What it has to do with AI, I don't know. People want AI everywhere now.


Exactly. So let's expand. A good reason to have AI everywhere is that it is capable of giving you a fair answer for just about anything. So ask it to do some data analytics stuff, like what Tableau or PowerBI can do. It can provide maybe 60% of the same functionality that most users require (provided data access, blah blah). Ask it do patient pre-triage. It will get you within 60% of a ballpark answer. Ask it to diagnose a car problem, or a crop rotation plan. Once again, it get's you in the ballpark. So what I'm suggesting is, the current state of the art has no Dunbar limitation and no bias toward any particular domain. It's like a 10k person team that doesn't care what it's solutioning (L1). Generalize the L1 to provide high assurance foundational functionality (workflows, custom workitems, some general way and tools to get from a strategic opinion to an executable fact).

People are still limited by Dunbar's number, so they need domain specific vocabularies to help them describe solutions to smaller groups. Maybe a direction exploitable by lisp at the L2 level.

But with an AI native L1, it doesn't care about the domain but would need to hold up the whole organization. Ada assurance. So it produces a 60% solution that has to be consumable by any particular L2. Multiple enterprise apps with a common base layer. No need to provide connectors or bridging apps for separate ERP, SCM, BI, HR vendors. Complete line of site, real time analytics and real time budget adjustments, eliminating need for budget cycles. It's kind of the Deus Ex God app. Deprecates need for separate Salesforce, Oracle Fusion, Tableau apps, separate vendor expenses, etc.


If a case doesn't match, then speeding up the remaining 0.1% is not going to change much the overall speed. Hence, a non optimized algorithm maybe enough. But when speed and speed variance is critical, then optimization is a must.


The "overall speed" is rarely all that matters.


Perhaps instead of coding many small one day projects one could program one day projects that compose with each other. For example, I was thinking about developing a library that implements a version of J in Common Lisp (but I think fuel is lacking) so that, for example, the one day project named random-sample could be just:

  randomSample =: 4 :'(x ? #y){y' NB. can't repeat.

  randomSample =: 4 :'(? x # #y){y' can repeat.

 So that, in many cases,  one day projects could be reduced to one or two lines definitions (for those that know J that is the caveat).


I think they're showcasing existing projects instead of making a new one each day.


Thanks, it seems you are right. The random-sample project seems so short that I thought it was a one day project and that the other would be also short projects.


Not J, but there is an APL compiler made in Common Lisp here: https://github.com/phantomics/april


Thanks for the April reference. In the post [1] there are just 11 comments and also it seems that the intended audience could maybe two dozen people!, not very encouraging to create a J in Common Lisp.

[1] https://news.ycombinator.com/item?id=22225136


I'd be a very interested admirer, if not user, of such a project. I'm playing with J this past week, and otherwise have a couple of CL books under my belt. No other real experience programming, but I certainly think that sounds like a cool idea.

I wonder where April would fit in, with your idea? Joining forces with the fellow who made April might be a possibility. Strength in numbers, and all that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: