Google is not the 70s, and they haven't really done much, and not recently. Goog...

samsquire · on July 19, 2023

I am not familiar with any particular research, I just don't doubt that what I'm doing has not been done before. I still think it's worth doing.

What I meant was that some ideas get trapped in corporations and never open sourced. If you have any papers you recommend, I would read them. I have a list of whitepapers on my Github profile.

Recently I was trying to paralellise the A* graph search algorithm for code generation. I got it parallel by sharding neighbour discovery per thread. This speeds it up, so it scales because I sharded the problem space.

What if people could model the problems as data flow problems and then the paralellisation is automatic? In my experience all the tutorials and materials on the internet refer to threads or go channels. These are low level primitives for what I'm aiming for.

Thanks for the reference to "Bulk synchronous parallel", I had not heard of that.

I'm investigating at turning arbitrary OOP style code into LMAX Disruptor style pipelined actor code that crunches through events in parallel.

My goal is that programs written in my notation are parallel by default, without explicit design due to sharding and scheduling. I tried naively tried to shard a simple bank without my syntax and that gets 700 million requests per second on 12 threads, because I shard the money into different integers across threads. I want this kind of thinking to come by default.

The automatic parallelisation I was thinking about I've seen is to do with loops or autovectorisation. I want to combine event streams with loop parallelisation. I am also interested in SIMD+multithreading together but I've not studied that in any detail.

I am inspired by Alan's Kay's original idea of OOP programming.

So I've turned objects that are routed to "actors" or "tasks" that emit "event arrays" or streams and then parallelise their processing. Loops are first class citizens. This is inspired by coroutines and generators.

I turn control flow to data and parallelise that in addition to data parallelism.

I use routing to shard.

Take the canonical example of being a search giant and you want to download multiple files, parse them for links, then index those links and then save them somewhere. You have CPU heavy tasks and IO heavy tasks. I want to multiplex IO with coroutines per thread and CPU tasks per multiple threads. You could say this is an example of what you describe as scheduling graphs of tasks.

  url(url) | task download-url
  for url in urls:
    fire document(url, download(url))

  document(url) | task extract-links
  parsed = parse(document)
   fire parsed-document(url, parsed)

  parsed-document(parsed) | task fetch-links
  for link in document.query("a")
    fire new-link(url, document, link)


  new-link(url, document, link) | task save-data
   fire saved-link(url, link, db.save(url, link))

  for url in ["http://samsquire.com/", "https://devops-pipeline.com/"]:
   fire url(url)

This program corresponds to the following event stream:

  url("https://samsquire.com/")
  url("https://devops-pipeline.com/")
  document("https://samsquire.com/", downloaded_document)
  document("https://devops-pipeline.com/", downloaded_document)
  parsed-document("https://samsquire.com/", parsed)
  parsed-document("https://devops-pipeline.com/", parsed)
  new-link(url, document, link)
  new-link(url, document, link)
  saved-link(url, link, db_callback)
  saved-link(url, link, db_callback)

url() events are in array, document() are in an array() parsed-document() are in array, so they can be performantly processed, by different threads compiled down to loops.

They can be routed to shard.

The events of this event stream can be dispatched from/in different threads (and machines) and routed to be crunched in parallel. Each event corresponds to a "mailbox" which corresponds to a thread mapping and we can define topologies of graphs like you say based on these objects and events.

Some of my thoughts:

* I am trying to combine coroutines with threads for efficient scheduling.

* If OOP interactions can be mapped to multidimensional array buffers, we can even inline code to be autovectorised as simple loops.

* I want to integrate my 1:M:N lightweight thread scheduler, and epoll based server (hopefully I can rewrite it to use liburing) with a general purpose parallelising runtime.

* Vitess is a sharding database proxy, sharding is extremely powerful paralellisation technique.