Hacker Newsnew | past | comments | ask | show | jobs | submit | a2code's commentslogin

> The most impressive thing humans can do is to think. > And the best kind of thinking, or more precisely the best proof that one has thought well, is to make good new things. > ... but making good new things is a should in the sense that this is how to live to one's full potential.

I urge you not to take these opinions as facts. Originality is admirable, but it is not "your potential", "proof of great thoughts", or "the most impressive thing you can do".

The answer to the question: What to do? is not "Make new things", but rather begins with a simple question: In what context?

The idea of dividing people into two categories: 1) those who "take care of people and the world", and those who 2) "make good new things", is harmful.


Does it support other scheme implementations? Or at least any plans to do so?


Porting to a different Scheme implementation requires some effort: schemesh needs a good, bidirectional C FFI and an (eval) that allows any Scheme form, including definitions.

For creating a single `schemesh` executable with the usual shell-compatible options and arguments, the Scheme implementation also needs to be linkable as a library from C:

Chez Scheme provides a `kernel.o` or `libkernel.a` library that you can link into C code, then call the C functions Sscheme_init(), Sregister_boot_file() and finally Scall0(some_scheme_repl_procedure) or Sscheme_start()


Will YC and startups be more or less agreeable on remote work? Will the new administration cause changes to remote work?


I can't comment on YC and startups and while the administration likely will make changes to the H-1B law and process, I don't see major changes in how remote work is viewed and treated.


In general, a compiler takes source code and generates object code or an executable. Can you elaborate on what your compiler takes as input and generates as an output?


Hello! Thanks for your question.

First of all, there are three layers of abstraction within Caten:

1. caten/apis | High-Level Graph Interface 2. caten/air | Low-Level Graph Interface 3. caten/codegen | AIR Graph => Kernel Generator

The inputs of the compiler are just Common Lisp classes (similar to torch modules). For example, in Common Lisp, we could create a module that does SinCos:

    (defclass SinCos (Func) nil
      (:documentation "The func SinCos computes sin(cos(x))"))

    ;; Forward creates a lazy tensor for the next computation.
    ;; You can skip this process by using the `st` macro.
    (defmethod forward ((op SinCos) &rest tensors)
      (st "A[~] -> A[~]" (tensors)))

    ;; Backward is optional (skipped this time)
    (defmethod backward ((op SinCos) &optional prev-grad)
      (declare (ignore prev-grad))
      nil)

    ;; Lower describes the lowered expression of `SinCos`
    (defmethod lower ((op SinCos) &rest inputs)
      (let ((x (car inputs)))
        (with-context
          (a (%sin (%add x (%fconst (/ pi 2)))))
          (b (%sin a)))))
The `apis` layer is the high-level interface, while the `lower` method is the lower-level step before code generation.

Next, the framework generates an Abstract VM (AVM) representation:

    #S(AVM :GRAPH Graph[seen=NIL, outputs=(STC6466_1)] {
      <ALLOCATE : TID6464 <- (shape=(1), stride=(1)) where :dtype=FLOAT32>
      <Node[BUFFER] ALLOCATE(NID6480) : SID6479* <- ()>
      <Node[BINARYOPS] ADD(NID6484) : BID6483* <- (TID6464, LID6481)>
      <Node[UNARYOPS] SIN(NID6486) : UID6485* <- (BID6483)>
      <Node[UNARYOPS] SIN(NID6488) : UID6487* <- (UID6485)>
      <Node[SPECIAL/VM] PAUSE/BACKWARD(NID6501) : STC6466_1* <- (UID6487)>
    })
Then, the computation graph is translated into schedule items:

    FastGraph[outputs=(val_6)] {
      { Allocate } : [ val_0 <- (1) ]
      { KERNEL } : [ val_5 <- val_1, val_0 :name=FUSED_SIN_SIN_ADD_LOAD6511]
    }
Finally, the code generation step produces the following C code:

    void fused_sin_sin_add_load6511(float* val_5, const float* restrict val_0);
    void fused_sin_sin_add_load6511(float* val_5, const float* restrict val_0) {
        val_5[0] = sin(sin((val_0[0] + 1.5707964)));
    }
This C code is compiled by a C compiler and executed.

So to answer your question: the compiler takes Common Lisp code and generates C functions.


Few compilers generate object code. Most generate an intermediate form, often assembler, which is converted to object code by another tool.


What are some other applications of WebAssembly that you find promising?


The promise of WebAssembly is bringing actual large scale applications to the browser.

To be clear, it is possible to build large apps with technologies like TypeScript, with VSCode being probably the standard bearer of this approach, but at Leaning Technologies we don't believe this solution to be viable for everybody.

We strongly believe that more "traditional" languages are required instead to work at scale, especially C++ and Java, and we offer products dedicated to these languages as well: Cheerp and CheerpJ respectively.

Although these languages might not be the most exciting ones, they have a proven track record for delivering large scale applications used by millions of people. Billions of people if the Operating Systems we all use are taken into account.

Thanks to WebAssembly these robust and scalable languages can be used to build Web apps. My own personal metric for the success of WebAssembly is seeing AAA games delivering playable demos as Web apps by cross-compiling (a portion of) the game to the browser.


> I found no evidence of someone having invented it before.

Most Lisp/Scheme implementations offer tracing. Amazingly, the SICP book even shows tracing as a means of analyzing whether a recursive function is tail-recursive.

What I'm curious about is the Lisp history of tracing. Does anyone know which implementations were the first to provide tracing functionality?


In C land, function entry/exit traces (or reconstruction of one) is a prerequisite for time travel debugging which has been commercially available for over 20 years.

If we are talking prologue/epilogue hooks, those have been commandeered (just like how Cosmopolitan hijacked them) for bespoke entry/exit tracing for who knows how long.

If we are talking generic (non-bespoke) automatic instrumented, whole-program, strippable/injectable entry/exit tracing then that has been commercially available for maybe a decade.

The reason it is not very popular is because naive implementations are high overhead and the trace data is such a deluge of information that you needed specialized visualization tools to make any sense of non-trivial programs.

That is because your average C program is going to do like 10-100 million function calls per second per core in normal operation. I repeat, 100 million logging statements per second, can your normal logging dependencies handle that? Even with just a 64-bit timestamp and nothing else that is nearly 1 GB/s of logging data. You are consuming significant fractions of memory bandwidth for logging. You want optimized high speed logging with efficient binary log formats to support that data rate without incurring immense overhead.

The visualization problem is not as bad these days either. The time travel debugging vendors had to handle even more ridiculous amounts of data so pioneered trace visualization techniques that work on even more, denser data. Everybody else has since copied these techniques to make well known trace visualizers such as perfetto. That is why they work so well across the gamut of tracing implementations (no matter how dense) despite most web development tracers that generate data being so anemic; they lifted from visualizers that actually needed to invent how to effectively visualize the hardest use cases.


Cosmopolitan Libc --ftrace on my workstation logs 1 million lines per second for a program written in C like `python -c "print('hello')" --ftrace`. If I do clang-19 --ftrace which is written in C++ then it does 476 thousand lines per second. That goes half as fast because the Cosmo Libc privileged runtime has to demangle C++ symbols before printing them in the log. Note I'm using `export KPRINTF_LOG=log`. It's hard to believe it goes so fast considering it's a single thread doing this and kisdangerous() has to lock the memory manager and search a red-black tree every time a pointer is dereferenced (e.g. unwinding stack frames to print space indentation) by kprintf() or ftracer(). If the Linux kernel write() syscall didn't have so much overhead it'd probably go faster.


Yeah, if you instead dumped a dense binary log format into the log and then demangled/decoded on display, then you should be able to get much higher throughput.

A separate decode step instead of straight to the console is not so bad because the log throughput is already high enough that you can not read it live, so switching to a post-mortem/dump with a dedicated decode step offers you a lot of options for efficiency.

If you want to go further, since you would no longer emit it directly into the console, you can relax flushing requirements and batch more data before persisting. You could also go for a dedicated log, make it single-producer/single-consumer, or other ways of optimizing the logging itself.

You could probably get 10x on just basic encoded formatting. The next 10x would likely require logging and storage optimizations. It is likely that your OS and storage devices would become your limiting factor at that point. The last 10x to get to the billion events per second per core level is turbo black magic and requires substantive changes to the compiler, operating system, and runtime to achieve.


It is a seemingly outlandish claim.

Even in just the C world, for instance, profiling tools are able to trace all the function calls to determine how many times they are called, how much time is spent, and to recover the call graph information.

The --ftrace option in cosmpolitan shares a name with a Linux kernel function tracing mechanism.

What Justine may mean is that it's done in some new, unusual way; e.g. other than patching the initial instructions of functions to branch somewhere.


The first Lisp implementation in 1960 already had function tracing.


Tail-call optimization is very important when writing Scheme programs. By removing those, you loose the power of recursion.

Also when it comes to macros, does that include `syntax-rules` or `syntax-case` style macros, where the latter are much more powerful?

While an embedded Scheme-like language is incredibly useful, at some point I feel as if you would simply have to include these features, and to that end it would just be Scheme reinvented.


> I want to get out of this learning loop where I keep experiencing that I know nothing so I keep learning and I think I am good for it and then I apply for jobs and the loop repeats itself.

So? Just stop learning. Close that book, video, course. Go make something people want. Or you want. What would you like to make?


When I was about to graduate, I wanted to make an app for mute people to talk through their phone (Not tons of people know SL here) but my professor curbed the idea. Afterwards I didn't know where to start, and this is why I wanted to do a job so I could learn how real-life projects and things work.

But you are right, I need to get ahead my fear of failure.


Making something helps to get hired. Not only is it a show of expertise, but also an expression of your unique personality. What is more, I asked what you want to make right now. Even people afraid of failure make things.


If I am being honest, a living. No project comes to my mind at all. Just that I want to work. I love coding, can spend hours on it. If someone ask me to make something, I am all up for it. But nothing comes to my mind on my own. It's just plain numb at this point. A project that I have started working on is a scraper that will have Google Sheets integration and then I will use it make insights + some other plans of scaling this project.


I am not sure if I follow correctly. Please clarify the following for me. Have you succeeded in making a tool which undoes the work of the linker?


I did, you can find the Ghidra extension there: https://github.com/boricj/ghidra-delinker-extension

The problem is properly identifying the relocations spots and their targets inside a Ghidra database, which is based on references. On x86 it's fairly easy because there's usually a 4-byte absolute or relative immediate operand within the instruction that carries the reference. On MIPS it's very hard because of split MIPS_HI16/MIPS_LO16 relocations and the actual reference can be hundreds of instructions away.

So you need both instruction flow analysis strong enough to handle large functions and code built with optimizations, as well as pattern matching for the various possible instruction sequences, some of them overlapping and others looking like regular expressions in the case of accessing multi-dimensional arrays. All of that while trying to avoid algorithms with bad worst cases because it'll take too long to run on large functions (each ADDU instruction generates two paths to analyze because of the two source registers).

Besides that, you're working on top of a Ghidra database mostly filled by Ghidra's analyzers, which aren't perfect. Incorrect data within that database, like constants mistaken for addresses, off-by-n references or missing references will lead to very exotic undefined behaviors by the delinked code unless cleaned up by hand. I have some diagnostics to help identify some of these cases, but it's very tricky.

On top of that, the delinked object file doesn't have debugging symbols, so it's a challenge to figure out what's going wrong with a debugger when there's a failure in a program that uses it. It could be an immediate segmentation fault, or the program can work without crashing but with its execution flow incorrect or generating incorrect data as output. I've thought about generating DWARF or STABS debugging data from Ghidra's database, but it sounds like yet another rabbit hole.

I'm on my fifth or sixth iteration of the MIPS analyzer, each one better than the previous one, but it's still choking on kilobytes-long functions.

Also, I've only covered 32-bit x86 and MIPS on ELF for C code. The matrix of ISAs and object file formats (ELF, Mach-O, COFF, a.out, OMF...) is rather large. C++ or Fortran would require special considerations for COMMON sections (vtables, typeinfos, inline functions, default constructors/destructors, implicit template instantiations...). Also, you need to mend potentially incompatible ABIs together when you mix-and-match different platforms. This is why I think there's a thesis or two to be done here, the rabbit hole is really that deep once you start digging.

Sorry for the walls of text, but without literature on this I'm forced to build up my explanations from basic principles just so that people have a chance of following along.


It is not easy to follow. And there are many things worth discussing. I understand there are complications with MIPS and C++ just to name a few.

But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable? Are the executables identical? When does it break?

How much of a task is it to make it a standalone program? What about x64 support?


> But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable?

There are links in the README of my Ghidra extension repository that explain these use-cases in-depth on my blog, but as a summary:

- You can delink the program as a whole and relink it. This can port a program from one file format to another (a.out -> ELF) and change its base address.

- You can delink parts of a program and relink them into a program. This can accomplish a number of things, like transforming a statically-linked program into a dynamically-linked one, swapping the statically linked C standard library for another one, making a port of the program to a foreign system, creating binary patches by swapping out functions or data with new implementations...

- You can delink parts of a program and turn them into a library. For example, I've ripped out the archive code from a PlayStation game built by a COFF toolchain, turned it into a Linux MIPS ELF object file and made an asset extractor that leverages it, without actually figuring out the archive file format or even how this archive code works.

You can probably do even crazier stuff than these examples. This basically turns programs into Lego blocks. As long as you can mend them together, you can do pretty much anything you want. You can also probably work on object files and dynamically-linked libraries too, but I haven't tried it myself.

> Are the executables identical?

Probably not byte-identical, but you can make executables that have the same observable behavior if you don't swap out anything in a manner that impacts it. The interesting stuff happens when you start mixing things up.

> When does it break?

Whenever the object file produced is incorrect or when you don't properly mend together incompatible ABIs. The first case happens mostly when the resynthesized relocations are missing or incorrect, corrupting section bytes in various ways. The second case can happen if you start moving object files across operating systems, file formats, toolchains or platforms.

> How much of a task is it to make it a standalone program?

My analyzers rely on a Ghidra database for symbols, data types, references and disassembly. You can probably port/rewrite that to run on top of another reverse-engineering framework. I don't think turning it into a standalone program would be practical because you'll need to provide either an equivalent database or the analyzers to build it, alongside the UI to fix errors.

> What about x64 support?

Should be fairly straightforward since I already have 32-bit x86 support, so the bulk of the logic is already there.

I encourage you to read my blog if you want to get an idea how this delinking stuff works in practice. You can also send an email to me if you want, Hacker News isn't really set up for long, in-depth technical discussions.


This sounds like it would benefit from modifications to linkers to make decomposition easier. The benefits of code reuse might make it worthwhile, although the security implications of code reuse without having any idea of what's in the code seem formidable.


Some linkers can be instructed to leave the relocation sections in the output (-q/--emit-relocs for gold/mold), but it's extremely unlikely that an artifact you would care about was built with this obscure option.

I'm mostly using this delinking technique on PlayStation video games, Linux programs from the 90s and my own test programs, so I'm not that worried about security implications in my case. If you're stuffing bits and pieces taken from artifacts with questionable origins into programs and then execute them without due diligence, that's another story.


If it sounds too good to be true, it probably is not.

If removing weights improves some metrics, that may be a clue that the model is not optimal in some sense.


The algorithm still uses all the weights, just not all the time - just skips the weights when they are not important given an input vector.

Also, approximation methods, as a field, are not new and they have shown their use.

Having said all that, extraordinary claims require extraordinary evidence - that’s why I hedge the communication messages. It’s „probably” until we get serious tests going on


The metric that's improved is computation speed, and it's achieved by essentially changing the computation (by not performing some computation that likely doesn't have large impact on the results).

Give that it's a different computation, you could argue that Mistral+effort is a new model with the improved metric of quality per amount of computation performed.

Otherwise - given that for every different input there's a seperate set of weights in the model that are excluded - I don't think you could conclude from this (if it holds up etc etc) that the base model is not optimal.

In a similar sense, quantization improved the "quality per model size" metric, but I don't think people are arguing that Mistral is less optimal than quantised Mistral (unless you're speaking about literally that metric). On the other hand, if you're targeting that metric specifically, then it would make perfect sense to say that quantised Mistral is more optimisal for it.

I guess it comes down to optimality being dependent on the metric you're looking at, and there being many things you might want to optimise for.

To note again, if this technique holds up, it's better than model distillation (just get rid of some of the weights) because for some inputs those weights could matter and this technique should (iiuc) account for that somewhat. To me, this is what it sounds like you're referring to when saying:

> If removing weights improves some metrics, that may be a clue that the model is not optimal in some sense


Yeah, more tests are needed. I got some feedback on using KL instead of the token similarity - initial tests seem to show that it is workable (compared to Q8), but not awesomely amazing - will be working on that next week and publishing.

As for treating effort+Mistral as a separate model - I wouldn't do that comparison. The model stays the same, all the weights from it are still being used, just not all of the time - we don't really lose information from the source model.


Wait, does the second "it" refer to the true part? Because traditionally, it refers to the "too good" expression. So you'd say, "it _is_ too good to be true".


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: