The RF spectrum is a public good in the US and there are requirements placed on the winners of those auctions to demonstrate it provides some public benefit. A company can't just buy spectrum and sit on it, for example. They must use start to use it in a certain timeframe.
The RF spectrum is a common good, not a public good. Public goods are non-excludable and non-rivalrous. The RF spectrum is non-excludable (anyone can transmit on any frequency, given the right equipment) but rivalrous (transmitting on one frequency prevents others from using that frequency).
Requiring the winner of a spectrum auction to use it is a way to prevent anti-competitive tactics (since the government is granting a monopoly to the winner). The goal is to incentivize productive use of limited resources, not necessarily to benefit everyone. In theory, the winner could use the spectrum for entirely internal purposes. Though in real world spectrum auctions, the government usually has stipulations such as requiring interoperability or using open standards. This reduces the value that the government captures, but likely increases the value that is created overall.
Before spectrum auctions, the government simply mandated what frequency bands were used for what, and by whom. Getting access usually meant lobbying and back room deals. Sometimes the FCC used lotteries, which caused speculators to enter lotteries and then license access (basically capturing revenue that would have gone to the government had the spectrum been auctioned). In practice, auctions are the worst form of spectrum allocation, except for all the others.
All I can find on the Smithsonian is that they did press interviews, where various staff expressed opposition, and that they also sent some report to Congress. The press interviews are, quite naturally, public statements, and it could be argued they're unrelated to lobbying. As for the report, that's part of their normal duties - it would be a real catch-22 if such a report were considered lobbying. This feels like bluster from the politicians; they write dumb letters all the time for PR purposes.
The space shuttle situation, though, is a disaster.
So all articles will be open and free to read. The ACM Open subscription mainly includes publishing at a lower overall cost than the per-article rates, but also includes "AI-assisted search, bulk downloads, and citation management" and "article usage metrics, citation trends, and Altmetric tracking".
So, this is pretty difficult to test in a real-world environment, but I did a little LLM experiment. Two prompts, (A) "Implement a consensus algorithm for 3 nodes with 1 failure allowed." vs. (B) "Write a provably optimal distributed algorithm for Byzantine agreement in asynchronous networks with at least 1/3 malicious nodes". Prompt A generates a simple majority-vote approach and says "This code does not handle 'Byzantine' failures where nodes can act maliciously or send contradictory information." Prompt B generates "This is the simplified core consensus logic of the Practical Byzantine Fault Tolerance (PBFT) algorithm".
I would say, if you have to design a good consensus algorithm, PBFT is a much better starting point, and can indeed be scaled down. If you have to run something tomorrow, the majority-vote code probably runs as-is, but doesn't help you with the literature at all. It's essentially the iron triangle - good vs. cheap. In the talk the speaker was clearly aiming for quality above all else.
The dates are the dates of the sources, he says in the talk he wasn't going to try to infer the dates these ideas were invented. Also he barely talked about Alan Kay.
From the video: "It's like, yeah, he said that in 2003, right? He said that after a very long time. So why did he say it? It's because 10 years earlier, he was already saying he kind of soured on it."
Casey says he “didn’t really cover Alan Kay” https://youtu.be/wo84LFzx5nI?t=8651 To me that says that Kay wasn’t a major focus of his research. That seems to be reflected in the talk itself: I counted 6 Bjorne sources, 4 Alan Kay sources, 2 more related to Smalltalk, and about 10 focused on Sketchpad, Douglas Ross, and others. By source count, the talk is roughly 18% about Alan Kay and 27% about Smalltalk overall - not a huge part.
As far as the narrative, probably the clearest expression of Casey's thesis is at https://youtu.be/wo84LFzx5nI?t=6187 "Alan Kay had a degree in molecular biology. ... [he was] thinking of little tiny cells that communicate back and forth but which do not reach across into each other's domain to do different things. And so [he was certain that] that was the future of how we will
engineer things. They're going to be like microorganisms where they're little things that we instance, and they'll just talk to each other. So everything will be built that way from the ground up." AFAICT the gist of this is true, Kay was indeed inspired by biological cells and that is why he emphasized message-passing so heavily. His undergraduate degree was in math + bio, not just bio, but close enough.
As far as specific discussion, Casey says, regarding a quote on inheritance: https://youtu.be/wo84LFzx5nI?t=843 "that's a little bit weird. I don't know. Maybe Alan Kay... will come to tell us what he actually was trying to say there exactly." So yeah, Casey has already admitted he has no understanding of Alan Kay's writings. I don't know what else you want.
I nearly jumped out of my proverbial seat with joy when Casey talked about it being about where you draw your encapsulation boundaries. YES! THIS IS THE THING PEOPLE ARGUING ABOUT OOP NEVER SEEM TO ADDRESS DIRECTLY!
Honestly would love to see a Kay and Casey discussion about this very thing.
I find the discussions about real domain vs OOP objects to be a bit tangential, though still worth having. When constructing a program from objects, there’s a ton of objects that you create that have no real-world or domain analogs. After all, you’re writing a program by building little machines that do things. Your domain model likely doesn’t contain an EventBus or JsonDeserializer; that purely exists in the abstract ‘world’ of your software.
Here’s a thought: Conceptually, what would stop me from writing an ECS in Smalltalk? I can’t think of anything off the top of my head (whether I’d want to or not is a different question). Casey even hints at this.
This is probably the best Casey talk I’ve ever seen and one of the clearest definitions of ‘here is my problem with OOP’. I don’t agree with everything necessarily, but it’s the first time I’ve watched one of these and thought “yep they actually said the concrete thing that they disagree with”.
Ok so jank is Clojure but with C++/LLVM runtime rather than JVM. So already all of its types are C++ types, that presumably makes things a lot easier. Basically it just uses libclang / CppInterOp to get the corresponding LLVM types and then emits a function call. https://github.com/jank-lang/jank/blob/interop/compiler%2Bru...
Most Linux distros have vastly more mirror infrastructure than they will ever need. And the torrent availability being through the roof is just the cherry on top.
Python is just a beautiful, well-designed language - in an era where LLM's generate code, it is kind of reassuring that they mostly generate beautiful code and Python has risen to the top. If you look at the graph, Julia and Lua also do incredibly well, despite being a minuscule fraction of the training data.
But Python/Julia/Lua are by no means the most natural languages - what is natural is what people write before the LLM, the stuff that the LLM translates into Python. And it is hard to get a good look at these "raw prompts" as the LLM companies are keeping these datasets closely guarded, but from HumanEval and MBPP+ and YouTube videos of people vibe coding and such, it is clear that it is mostly English prose, with occasional formulas and code snippets thrown in, and also it is not "ugly" text but generally pre-processed through an LLM. So from my perspective the next step is to switch from Python as the source language to prompts as the source language - integrating LLM's into the compilation pipeline is a logical step. But, currently, they are too expensive to use consistently, so this is blocked by hardware development economics.
mhhm yes yes. There's a thread of discussion that I didn't quite chose to delve into in the post, but there is something interesting to be found in the observation that languages that are close to natural language (Python being famous for being almost executable pseudo-code for a while) being easier for LLMs to generate.
Maybe designing new languages to be close to pseudo-code might lead to better results in terms of asking LLMs to generate them? but there's also a fear that maybe prose-like syntax might not be the most appropriate for some problem domains.
Wasn't there that thing about how large LLM's are essentially compression algorithms (https://arxiv.org/pdf/2309.10668)? Maybe that's where this article is coming from, is the idea that finetuning "adds" data to the set of data that compresses well. But that indeed doesn't work unless you mix in the finetuning data with the original training corpus of the base model. I think the article is wrong though in saying it "replaces" the data - it's true that finetuning without keeping in the original training corpus increases loss on the original data, but "large" in LLM really is large and current models are not trained to saturation so there is plenty of room to fit in finetuning if you do it right.
Not sure what you mean by “not trained to saturation”. Also I agree with the article, in the literature, the phenomenon to which the article refers is known as “catastrophic forgetting”. Because no one has specific knowledge about which weights contribute to model performance, by updating the weights via fine-tuning, you are modifying the model such that future performance will change in ways that are not understood. Also I may be showing my age a bit here, but I always thought “fine-tuning” was performing additional training on the output network (traditionally a fully-connected net), but leaving the initial portion (the “encoder”) weights unchanged - allowing the model to capture features the way it always has, but updating the way it generates outputs based on the discovered features.
OK, so this intuition is actually a bit hard to unpack, I got it from bits and pieces. So this is this post https://www.fast.ai/posts/2023-09-04-learning-jumps/. Essentially, a single pass over the training data is enough for the LLM to significantly "learn" the material. In fact if you read the LLM training papers, for the large-large models, they generally explicitly say that they only did 1 pass over the training corpus, and sometimes not even the full corpus, only like 80% of it or whatever. The other relevant information is the loss curves - models like Llama 3 are not trained until the loss on the training data is minimized, like typical ML models. Rather they use these approximate estimates of FLOPS / tokens vs. performance on benchmarks. But it is pretty much guaranteed that if you continued to train on the training data it would continue to improve its fit - 1 pass over the training data is by no means enough to adequately learn all of the patterns. So from a compression standpoint, the paper I linked previously says that an LLM is a great compressor - but it's not even fully tuned, hence "not trained to saturation".
Now as far as how fine-tuning affects model performance, it is pretty simple: improves fit on the fine-tuning data, decreases fit on original training corpus. Beyond that, yeah, it is hard to say if fine-tuning will help you solve your problem. My experience has been that it always hurts generalization, so if you aren't getting reasonable results with a base or chat-tuned model, then fine-tuning further will not help, but if you are getting results then fine-tuning will make it more consistent.
Always appreciated the work of Jeremy Howard. Also had a lot of fun using the Fast.ai framework. My experience is similar to your description. When using 2, 3, or more epochs, felt that overfitting started to emerge. (And I was CERTAINLY not training models anywhere near the size of modern LLMs) I suppose in this case by “saturation” you meant training “marginally before exhibiting over-fitting” - something akin to “the elbow method” w.r.t. clustering algorithms? I’ll have to chew on your description of overfitting results for a while. It jives with mine, but in a way that really makes me question my own - thanks for the thought provoking response!
I was thinking this was about leaking the kernels or something, but no, they are
"publishing" them in the sense of putting out the blog post - they just mean they are skipping the peer review process and not doing a formal paper.
reply