Hacker News new | past | comments | ask | show | jobs | submit | near's comments login

They're really incredibly useful for writing emulators. You have to simulate 3-8 processors all running in parallel, but doing so with locks and mutexes tens of millions of times a second is excruciatingly slow and painful, so you have to do this in a single thread (unless you're talking about very modern designs that have lower expectations of cycle-based timings.)

Cooperative threads like this let you completely avoid having to develop state machines for each cycle within state machines for each instruction, etc. They let you suspend a thread four levels into the call stack, and then immediately resume at that point once other emulated processors have caught up to it in time. That lets you do fun tricks like only synchronizing components when required, so it can in some instances end up not only far more elegant, but also much faster than state machines, when they're used well.

I wrote a bit more about this and showed some examples here if anyone's interested: https://near.sh/articles/design/cooperative-threading

I also use them for my web server because I like them, but there are probably better ways of doing that.


Seems like adopting async/await throughout would accomplish the same benefits (letting you co-operatively yield whenever you want) while maintaining the performance of the state machine (since that's what async/await is in a single-threaded context).


The key thing is that I need to be able to suspend 3-5 layers deep into the call frame. The instruction dispatcher calls into an instruction which calls into a bus memory read function which triggers a DMA transfer that then needs to switch to the video processor, and then I need to resume right there inside the DMA transfer function once the video processor has caught up in time. So the extra stack frame for each fiber/thread is essential.


It is, but these cores are almost exclusively not being done that way. Not yet at least. I hope that they will be, that would be really awesome. I paid $1200 last year for the SNES PPUs to be decapped for this purpose, but it's a truly enormous undertaking to map out those chips and then recreate it in Verilog. You're talking thousands of hours of work per chip. If anyone reading this is able to help with that effort, please do let me know, we could really use the help.


Not that this is necessarily helpful to you in the short term, but it strikes me as a good problem for machine learning (going from die pictures to transistor schematic.)


By decapped, do you mean delidded?

Theoretically it would be possible to automate this with a couple things:

- USB electron microscope to image the transistor topology

- CV lib to identify connections and generate corresponding Verilog code


“Decapping” is a more intense version of delidding where you use chemical agents or something similarly extreme (laser, plasma, milling) to remove the package (ceramic, plastic).

My understanding is that there are people who do it often enough that it is automated in the way you describe, but you still need someone with a lot of skill to spend serious time on it. Computer vision works wonders but there are errors which must be identified and fixed.

A lot of the chips people care about are can just be done optically, no electron microscope needed.


Ah, that’s a good distinction. I’d be pretty scared of damaging the hardware by doing that, but I’m sure there are some really experienced folks out there that would appreciate the hardware donation.


Decapping destroys the hardware.


It is indeed an amazing project, especially its open source nature. It provides some impressive power savings and latency reductions that are very hard to match with general purpose CPUs.

But in most cases, it is emulation, as the lead developer will attest.

https://github.com/MiSTer-devel/Main_MiSTer/wiki/Why-FPGA

"From my point of view, if the FPGA code is based on the circuitry of real hardware (along with the usual tweaks for FPGA compatibility), then it should be called replication. Anything else is emulation, since it uses different kinds of approximation to meet the same objectives. Currently, it's hard to find a core that can truly be called a replica – most cores are based on more-or-less functional recreations rather than true circuit recreation. The most widely used CPU cores – the Z80 (T80) and MC68000 (TG68K) – are pure functional emulations, not replications. So it's okay to call FPGA cores emulators, unless they are proven to be replicas."

But there's nothing wrong with emulation for preservation, until we get to a point where we can wide-scale clone these older chips down to the transistor level through analysis of delayered decap scans. And even then, emulation will be useful for artificial enhancements as well as for understanding how all those transistors actually worked at a higher level.

It's also not a total solution: by taking many more transistors to programmatically simulate just one, it limits the maximum scale and frequency of what it can support. N64/PS1/Saturn has not yet been fully supported and is still theoretical, but likely, to be possible. Going beyond that is not possible at this time.

Software emulation and FPGA devices should be seen as complementary approaches, rather than competitive. The developers of each often work together, and new knowledge is mutually beneficial.


Ah ok I wasn't aware of this. I thought it was spot on.

And yeah I hope we can easily order small batches of ICs (at big pitch of course) in a few years, in a similar way to how creating PCBs has become so simple now.

I mean I remember how much of a PITA it was in the 80s. Drawing on overhead sheets. All the acids and other chemicals. Drilling. And now we get super-accurate 10x10cm boards dual-layer, drilled, soldermasked and silkscreened for a buck a pop with a minimum of 10. Wow. I really hope this trend continues down to the scale of ICs (or that FPGAs simply get better/easier).

By the way, emulating a CPU is pretty easy and very accurate anyway. The big problem with accurate emulation is with some of the peripheral ICs which used hard to emulate stuff like analog sound generators.


> It's also not a total solution: by taking many more transistors to programmatically simulate just one, it limits the maximum scale and frequency of what it can support. N64/PS1/Saturn has not yet been fully supported and is still theoretical, but likely, to be possible. Going beyond that is not possible at this time.

The limiting factor here is the amount of stuff you can throw into a single FPGA, correct?

So in theory, shouldn't it be possible to tie a bunch of FPGAs together, with two beefy ones being responsible for replicating CPU / GPU functionality, a couple smaller ones for sound and other "helper" processors, and some bog-standard ARM SoC to provide the bitstreams to the FPGAs and emulate storage (game cartridges, save cards) and input elements (mainly "modern" controllers)?


There's both a cost and a speed barrier to it. FPGAs are often used to design, simulate, and test modern circuits at sub-realtime speeds. No amount of FPGAs will get you a PS2 emulator at playable speeds right now, let alone a PS3/Switch emulator. PCs can do that today by taking shortcuts such as dynamic recompilation and idle loop skipping.


Hmm... looking at the frequencies and gate counts, I think PS2 is well within realm of possibility to run on a not-so-cheap FPGA (or several). But PS3 generation consoles definitely not.


> The limiting factor here is the amount of stuff you can throw into a single FPGA, correct?

And the speed that you can get your design to run at. Something like the Game Cube (PPC750 @ 485 MHz) would be difficult to implement in an FPGA, for example.


Well, yeah, it's not replication if it's not an exact hardware replica, but the word "emulation" has very "software" connotations. I guess let's call it.. recreation? (That word is even in the quote above!)


"FPGA re-implementation" may be a better term


So it's not perfect but it's better than emulators...


In latency and power usage, yes. In compatibility and accuracy, no. Both are Turing complete, so there's nothing you can do with one that you can't do with the other.

If you take the SNES core, my software emulator has 100% compatibility and no known bugs, and synchronizes all components at the raw clock cycle level. It also mitigates most of the latency concern through a technique known as run-ahead. But it does require more power to do this.


I'm really curious where you got "better" out of the quoted text. Because it's not there or implied, but people keep reading this into anything about fpga recreations of chips. There's nothing inherently better about doing emulation on an fpga or a cpu, other than basically the amount of electricity involved in doing it.

But people keep presuming an improved accuracy that there's no basis for.


Lower latency is definitely a thing. With FPGA it's possible to 'chase the beam' like the original hardware, and have much reduced input latency from devices, etc. With an emulator you're going to be fighting the OS and the frameworks you built on top of. Even if you go "bare metal" (like my friend's BMC64 project which runs a C64 emulator like a unikernel on the RPi with no OS) you are still dealing with hardware built for usage patterns very different from the classic systems. You're always going to be one or more frames behind.


That is true. There are however techniques software emulators can use like run-ahead that can get you lower latency than even the original hardware on a PC: https://near.sh/articles/input/run-ahead

The caveat is that it doesn't always work, and it makes the power requirements even more unbalanced. Some might also see it as a form of cheating to go below the original game's latency. If you want to match the original game's latency precisely, FPGAs are the way to go right now for sure.


Run-ahead seems pretty cool, great technical write up. How would you compare this to the feature called frame-skipping that I often see implemented in software emulators?


Frame-skipping is just a speed hack of skipping rendering every other frame or so, and makes games very unenjoyable to play. It won't help with input lag at all.


Agreed about chasing the beam. With a SNAC addon and a CRT TV, you can even hook up original light guns to the mister and they work prefect.


Probably the marketing copy for Super NT and similar products... harder to get people to part with hundreds of dollars if your pitch is "lower power draw and reduced input delay"


Same story here. But in my case, Zoloft gave me really bad tinnitus that persisted even after stopping. I switched to Intuniv that got rid of the panic attacks but not the depression or the new tinnitus. Added Lexapro and that reversed the tinnitus and kept all the positive benefits of Zoloft. Brain chemistry is different for everyone, so it's worth trying more than one if the first doesn't work for you. It's truly life-changing when it works. No one should have to live with daily panic attacks.


> Therefore, the choice of which conversation, which comment, is entirely yours.

It's not that simple though. You're likely to be part of a broader community and simply deciding to leave that community, and all of your friends, over the actions of one person, is not very reasonable. Often times we are forced to be around people we don't particularly like. When that person does something valuable, they get a level of protection from being reprimanded for their bad behavior that isn't afforded to outsiders of the group, so kicking out such people often becomes difficult as well.


Yes, but reasonable to leave over the actions of many people.


Different people follow for different things. Some people like seeing that other stuff. It would be very helpful if Twitter let accounts tag tweets into categories, and let people choose to subscribe or unsubscribe to those. Maintaining multiple Twitter accounts for different content is very tedious. No doubt your point is valid as well, it's easy to get addicted to that platform, and the likes and notification system gamifies that need for validation.


>>> It would be very helpful if Twitter let accounts tag tweets into categories, <<<

Something close is ‘Topics’ where Twitter automatically categorizes tweets into Topics such as Programming, Sports, Startup, etc. You can follow a topic and thus see only tweets in that group. So Mr A sends 2 tweets - one about his kids and the other about Python. If you’re following the topic ‘Programming’, you’ll only see the second tweet


On a related note, I had always wished we at least got a complete PDF specification for the unreleased 65832, a 32-bit 6502 core. I'd love to implement it just for the novelty, even if nothing used it. http://www.mirkosoft.sk/65832.html


This made me wonder if anyone had designed a 64-but 6502. Came across this http://www.6502.org/users/andre/adv65/index.html which in turn links to many different things including a project that they had started on with making a 64-bit extension to 6502 for fun.


There is too much TimeCube on that WWW site. Try this one.

* https://retrocomputing.stackexchange.com/questions/14864/wha...


From skimming the datasheet, it looks like a fairly bad product. Sure the A register is extended to 32 bits, but the 8 bit data bus means that it's not going to be much faster than the 65816 and the address bus is still 24 bits. I have no clue why WDC decided to keep the same pinout as the 65816.


> I have no clue why WDC decided to keep the same pinout as the 65816.

The entire market for this new chip would have been 100% of the people with a 65816 and only those people. Companies do this for a couple reasons, last ditch effort to get cash flow or bridge a gap in a delayed product, extend a product line another cycle, etc.

Pinouts are powerful interfaces (semantically), look at the op-amp, what a wonderful design.

*edit, turns out this chip would have gone in last model of the Apple IIGS but it was the 65816 instead.

I find this stuff fascinating, how technologies fold and morph over time. Esp with MIPS pivoting into RISC-V.


Oh, it even has a datasheet! Thanks, that's amazing! The missing SEV instruction could have been aliased to SEP #$40. The only thing missing to produce a working replica today (in hardware or software, just for fun of course,) was indeed the XFE designation. The only place it could have gone was in the WDM prefix, but it could have been any byte after that. We would have to outright guess. He couldn't have reused XCE for it without risking perfect backward-compatibility (though of course he may have decided to do that anyway.)


What would be wrong with blocking the entire /32 if you know the owner of it is using it in such a way?


Ad trackers often use some ISP or cloud provider with many other customers. Which network ISP assigned to a given customer is not public information. Even if a company has own AS, blocking it not always an option: Google, Oracle, IBM and others potentially can use any IP in their networks for Ads, but too big to block.


> Its good to know you got 1M unique visitors last month, but I don't need to know I got 1,214,551 unique visitors. All analytic packages have problems like this.

The numbers vary depending on the technical prowess of your audience. Probably at least half the visitors to my site would be using ad blockers. If one were to use server-side logging (at the HTTP request level), it would not be blockable, and your numbers would be accurate save for any bot spam inflating it.


There's a magic point on YouTube at 10 minutes that lets creators insert ads in the middle of their videos, so they all stretch out the content for that purpose. It's very obnoxious.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: