Hacker News new | past | comments | ask | show | jobs | submit | phkahler's comments login

I just converted a big fixed point algorithm to float on Cortex-M4f. It runs very close to the same speed, but significantly more readable.

In the fixed code I used straight c but wrote functions for sin, cos, and reciprocal square root.

I can't see getting just a 2x improvement over soft float, while using llvm specific features and modifying libraries just eliminates portability.


Cortex-M4F has an FPU, i.e., hardware floating point support (albeit single precision only). You likely would see a much greater performance drop without it.

>> Cortex-M4F has an FPU, i.e., hardware floating point support

Maybe I wasn't clear. I know it has an FPU. I originally wrote the code in fixed point because I knew it would be fast and didn't trust the FPU performance on such a small chip. Now I converted it to float and get the same performance.

I also used an "f" suffix on all constants to avoid accidental software double precision. Also used the GCC flags to force it all to single precision and use the hard float libs.

On smaller chips I used to expect fixed point to be faster than a hardware FPU, but that's no longer the case. They are about equal for me.


> On smaller chips I used to expect fixed point to be faster than a hardware FPU, but that's no longer the case.

What kind of chip do you have in mind where that would be true?

Notably, M4 is the first of its line to have an FPU option at all.


I think I misread the comment (I thought you meant that 2x improvement seemed like too much, but perhaps you meant the opposite)

This[1] page claims the M4F has about 25M FLOPS vs 1.5M FLOPS for the M4 thanks to the FPU, in single precision mode at least.

[1]: https://s-o-c.org/cortex-m4f-vs-m4-how-do-these-arm-cores-co...


>> Yup, switched from embedded (on high-speed trains), to modern backend, x4 my salary.

These comments leave me confused. My Michigan based understanding would have you both poorly paid for embedded work and obscenely paid for backend work. I know some folks in CA make 200K and up, I'm just not sure how common that is. And anyone doing embedded should be at least 100K, so I don't see an easy x4.


Ah, the explanation is quite simple and I should've included it, sorry. I'm working in France, so I was at more or less 25k in embedded (not poor by french standard for my age, and had a lot of extra perks (like free train) compared to other companies, but not great either. I doubled when I started working for a french startup, and doubled again when I started working for an US-based startup.

Oh you changed countries in there too! That explains it ;-)

>> but when software is essential to your system, you should have that team at the table in your planning and budgeting efforts.

I like to tell them that software should be on the BoM even though the marginal cost in $0.00. Having it noted as a component helps get people on board with proper versioning, and also raises visibility for something otherwise unseen.


>> so I would like to hear what kinds of projects and challenges people who may have actually been employed doing work like this have come across.

I've been in and out of embedded for over 25 years now. Mostly in. Some things I've worked on over the years: Battery monitoring for a Ford EV back in 1997, control software for the Ibot (Dean Kamen's stair climbing wheelchair), a few small prototype gizmos, electric power steering systems (due to my early motor control experience on the Ibot), engine controller test code, airbag controllers (very boring, it's mostly endless diagnostics), ac/dc converters mostly for EV charging but also for other uses (up to 900KW), Electric water pumps, and something really fun that I can't tell you about yet. I've focused on motors because I accidentally became really good at Field Oriented Control and having that specialization pays well while being applicable to a bunch of different things - different products, but also the power conversion stuff was very closely related.

That's just the embedded side. I've bounced back to PC apps a few times in completely unrelated areas. It probably helps to be an engineer first, and software person second. This avoids the notion of being a "coder" and gets me involved more at product level design and development.

I know a guy who has done a lot of embedded audio DSP work. He's an older guy too and just had a few months out of work. It was fairly easy for him to find a job though and he's happy doing something new. Audio is an area where embedded and AI are actually coming together. If that's your thing, try diving into speech recognition on the Pi or something and then scale DOWN to smaller hardware. IMHO on-device voice control is going to become a mildly big deal in certain areas.

Be flexible and always try to work on interesting things!


The kids kite is flying backwards....

>> A user of an LLM might give the model some long text and then say "Translate this into German please". A Transformer can look back at its whole history.

Which isn't necessary. If you say "translate the following to german." Instead, all it needs is to remember the task at hand and a much smaller amount of recent input. Well, and the ability to output in parallel with processing input.


It's necessary for arbitrary information processing if you can forget and have no way to "unforget".

A model can decide to forget something that turns out to be important for some future prediction. A human can go back and re-read/listen etc, A transformer is always re-reading but a RNN can't and is fucked.


If the networks are to ever be a path to a closer to general intelligence, they will anyway need to be able to ask for context to be repeated, or to have separate storage where they can "choose" to replay it themselves. So this problem likely has to be solved another way anyway, both for transformers and for RNNs.

For a transformer, context is already always being repeated every token. They can fetch information that became useful anytime they want. I don't see what problem there is to solve here.

For a transformer, context is limited, so the same kind of problem applies after you exceed some size.

That's just because we twisted it's arm. One could for example feed the reversed input after, ie abc|cba where | is a special token. That would allow it to react to any part of the message.

I think this might be key, in addition to some landmark tokens to quickly backtrack to. The big question is how to train such model.

There is a recent paper from Meta that propose a way to train a model to backtrack its generation to improve generation alignment [0].

[0] https://arxiv.org/html/2409.14586v1


Also, a lightweight network could do a first pass to identify tasks, instructions, constraints etc, and then a second pass could use the RNN.

Consider the flood fill algorithm or union-find algorithm, which feels magical upon first exposure.

https://en.wikipedia.org/wiki/Hoshen%E2%80%93Kopelman_algori...

Having 2 passes can enable so much more than a single pass.

Another alternative could be to have a first pass make notes in a separate buffer while parsing the input. The bandwidth of the note taking and reading can be much much lower than that required for fetching the billions of parameters.


People did something similar to what you are describing 10 years ago: https://arxiv.org/abs/1409.0473

But it's trained on translations, rather than the whole Internet.


>> These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales.

Emphasis on not necessarily.

>> The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

Shouldn't the conclusion be "the resulting competitive performance has only been confirmed at small scale"?


yes, that is clearer indeed. However S4 and Mamba class models have also performed well at small scale and started lagging with larger models and larger context sizes, or at particular tasks.

That sounds reasonable. But sometimes I have a bunch of signals with some of them being pairs, so having a dark and light version of the same color helps to see them together. Does this work with adjacent rainbow colors as well?

You can use the same hue with different luminance.

>> Step 2:

>> Avoid any CLI tool that uses escape sequences for 8bit or 24bit colours by default.

I was going to point out that the author never takes a step back and asks "What would be the best way to handle this?" The problem there is we have to define what "best" is. IMHO that involves a number of principles. My preferences are:

1) Any user customization should be in one place.

2) The impact on programs should be minimal (in LoC for example).

Both of those suggest the solution belongs in the terminal.

IMHO it starts with terminal programs having sane default colors. What that means is fuzzy, but so is this whole discussion. IMHO colors should follow the "standard" so that blue is still recognizably blue. But consideration should be given to the common forms of color blindness - for example I have a hard time reading pure red on black (adding a bit of anything helps this, don't just use ff0000).

Once terminals get fixed to have sane defaults, CLI programs should use those 16 standard colors. Any attempt to use 24bit here is either saying "I give up on getting those terminal folks to offer sanity" or it's saying something like "I know best", but either way users end up with N programs they have to configure. Lets not define themes in cli apps OK? Remember, this is my answer to "what would be the best way to handle this?"

I have similar thoughts when it comes to web sites and fonts. Present content in HTML so users can configure how they want to see it. Similar for page formatting - it's not a magazine layout, let it flow.

Also stuff in desktop software. IMHO Wayland compositors should remember window placement. It was stupid for every X program to store and restore its window position. Wayland says knowing about the environment is a security issue (and I agree) but then it becomes the DEs job to handle this memory. It also unburdens ALL the apps from having code for this.

There are other areas where that question comes up "Where in the software stack should this thing be handled?" Whatever your opinion, I believe you should start by answering the questions around that word "should". What are the goals in selecting where a thing gets handled? My answers always lean toward simplicity and maintainability. What other principles might I adopt to answer these questions?


> What would be the best way to handle this?

A styling service from systemd.


>> For example, an AI diary manager can only organise your diary if it can access that diary, edit it, and retain information about your activities.

Edit a diary? Retain the information? A diary IS a log of information as understood at the time. It's also not the clouds business, nor to be edited.

My gosh these people sound stupid.


Diary in UK English can mean “Planner” or “Datebook”, and the article is from the BBC.

https://separatedbyacommonlanguage.blogspot.com/2007/04/diar...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: