Hacker News new | past | comments | ask | show | jobs | submit | _jab's comments login

Binary search is the obvious example.

What it and your example have in common is that a significant constraint exists on the input. I can't imagine how a deterministic algorithm with unconstrained input can process that input in sublinear time, but I would love to learn otherwise.


Binary search requires a sorted input, which requires that you first consider all elements of the input data set.

The sublinear algorithms this page is discussing require that the algorithm not consider all elements of the data set and you're not allowed to pre-process it with an O(n) or greater algorithm.

So no trees, no binary search. It's a different set of algorithms.


I can't imagine that's what they meant? The text very specifically says: "Indeed, it is hard to imagine doing much better than that, since for any nontrivial problem, it would seem that an algorithm must consider all of the input in order to make a decision." For them to be thinking of binary search, they would have to be effectively saying "it is hard to think of binary search", which would be a rather absurd position from a CS professor, especially given binary search is quite literally the first algorithm every programmer learns.

So I took it to mean there's something interesting here where the inputs could literally be anything, not heavily constrained. But I can't imagine what that might be.


> especially given binary search is quite literally the first algorithm every programmer learns.

I get what you're saying, and it doesn't change your point, but: no _way_ is binary search the first algorithm people learn. For binary search to even be intelligible you already have to know, at a minumum linear search and the concept of sorting (you probably know a sorting algorithm first). You also learn dozens of other really simple algorithms first in practice.


> no _way_ is binary search the first algorithm people learn. For binary search to even be intelligible you already have to know, at a minumum linear search and the concept of sorting

Almost every 6-10 year old kid who had to use physical dictionaries intuitively learned (probably even discovered by themselves) something like binary search. It's a different matter whether they could formalize that into an algorithm and write code to handle all the edge cases. But the basic idea is very intuitive. Kids can also pick up the intuition to incorporate improvements even beyond balanced binary search eg. there might be a lot of words starting with "S" so split into two groups at a little less than the middle, etc.


> Almost every 6-10 year old kid who had to use physical dictionaries intuitively learned (probably even discovered by themselves) something like binary search.

I find this highly unlikely. It might be true for those children who grow up to study computer science, though.


I would argue a dictionary word lookup problem for a six year old is closer to a skip list than a binary search.

Moving the goal post?

If you are asking which is the "first algorithm" a human learns in their life then it's likely more related to movement (crawl? walk? move food towards mouth?) or selection (which item can I eat? who are my parents?) rather than a physical dictionary. Even considering that it's been a while since kids encountered a physical dictionary.

If you are asking about formal algorithms then we're talking about the beginning of a programmers or computer scientists education and then it's usually some form of O(n^2) sort that they will encounter first, if we don't count things like "how to add two multi-digit integers" which is typically an algorithm every kid learns in primary school.

Binary search tends to be one of the first recursive algorithms that are taught which is another level entirely regarding intellectual development.


I guess my response was to how I read your comment fitting in with the higher level discussion. My main point is that many of these algorithms are intuitive, and kids learn these much earlier than when they learn formal programming (which might typically be in their teens).

Looking over your comment again, I also don't dispute that linear search and sorting are simpler -- even toddlers learn these.


> no _way_ is binary search the first algorithm people learn

It legit was the first one we learned, the first algorithm written on the blackboard by the professor (this was in the 2000s but the first algorithm lessons were on blackboard and paper!)

Probably because something simpler linear like "find the minimum value in a list" is too dull as an algorithm example


Agreed, WRT the bigger picture; lots of little algorithms could come before binary search.

But, giving them binary search before sorting kinda works. It is motivating. If you do sorting first, it just seems like a weird high-effort excursion into a niche bookkeeping thing about lists. Once they see how fast binary search is (just give them pre-sorted lists to run it on), sorting becomes interesting, right?


This is what I do in my introductory data structures and algorithms course at a Bay Area community college: I teach binary search as part of my introduction to recursion, and then the following lectures are a series of lessons on sorting algorithms, beginning with selection sort and then insertion sort. After teaching these O(n^2) algorithms, I spend a lecture on merge sort and then have a lecture on quicksort, both for covering O(n lg n) sorts and for introducing my students to divide-and-conquer recursive algorithms.

It is a shame that quicksort has to be covered. I mean, it does have to be covered. But it has an O(n^2) cost for a particular input, despite being generally considered nlog(n), seems to me to introduce some fuzziness in an otherwise solid concept.

But it does need to be covered.

Unfortunately.

(Mergesort is best).


Quicksort is an important, extremely flexible, and very hard to beat unstable comparison sort.

It's based on a very simple but powerful idea/strategy (divide & conquer). Its flexibility means it can be adapted to partially sort, find top-N, find the median or any other rank, all optimally.

And it's so much faster in practice than everything else (why?), that even after mitigating its worse case, it often comes out ahead.

Also, it is relevant/necessary to teach the concept of average, best and worse cases in complexity analysis. What best way to do it than “the best sorting algorithm is terrible for some inputs”?

You can also use it to teach/learn adaptive algorithms (you're almost expected too): switch to something else on the base case; or on the worst case; can we do better for low cardinality; etc.

So, of course it needs to be covered. There's more to learn from 200 lines of Quicksort than from Mergesort: https://github.com/ncruces/sort/blob/main/quick/quick.go


> very hard to beat

radix sort

If you’re forced to use comparative sorting and write it by hand and it’s near random etc then Quicksort isn’t that bad but even then there’s better options.


Yes, I said comparison sort.

And it is hard to beat, which is why the standard libraries of languages like C++, Rust, Go, C#, Java (etc) use some quicksort variant for their unstable sorts (and some mergesort variant when they need stability).

All of the above aren't forced to use comparison sorts, they just do; no other caveats required.


> Yes, I said comparison sort.

You mentioned it was a comparisons sort, but not that you were only comparing it with other comparison sorts.

There’s also a huge caveat, libraries know less about your data than you do. Thus different default choices become optimal but any decent library will give you many options for very good reasons.

Quicksort is relatively terrible for partially ordered lists like appended timestamps from multiple processes etc etc. It’s only ok at a very narrow range of problems, and the important bit isn’t the implementation but where those borders are.


> Quicksort is relatively terrible for partially ordered lists like appended timestamps from multiple processes etc etc.

It's not, not really: pdqsort handles those organically.

I'm sorry if "best sort", "so much faster" and "very hard to beat" felt like baits. But "huge caveat", "relatively terrible" and being "forced to use comparative sorting" are not fair descriptions of quicksort or why it's used and chosen by standard libraries.

Regardless, my point was that there's a lot to learn from quicksort. It's not “a shame” that it must be taught, and mergesort is not “best.”


Mergesort is best because it is the most elegant and beautiful sorting algorithm. Just merge lists so that they stay sorted.

Quicksort has all sorts of nonsense bouncing around and picking pivots. Eww. Terrible.

I will admit that I could have been more clear that I was evaluating algorithms based on beauty, but given that info I’m sure the conclusion is obvious.


Ehh objectively false is objectively false.

pdqsort fails to detect many partially ordered lists.

So you might argue about the scale of “Huge caveat” but trying to discover information about a list takes computing cycles. Even just pdqsort vs pdqsort_branchless exists due to that exact caveat, it’s fundamental to the nature of the problem. There’s infinitely ways data could have an inherent pattern which would speed up the process of sorting it and no way to algorithmically notice all patterns in such a way as to universally speed up sorting.

As to what there is to learn from Quicksort. I think it’s a poor introduction to algorithms not because it’s of the nature as an algorithm, but early on its many pitfalls draw attention away from more useful topics. Later on it’s simply not complex enough to be worth much attention. So sure in an academic context it looks really appealing, yet when you dig into what the point of teaching algorithms it’s much harder to justify. It’s covered so frequently you’re not even going to see it on interviews.


But it has an O(n^2) cost for a particular input

Even worse is the fact for naive implementations (such as students might come up with) the worst case behaviour occurs in very common cases such as sorting already sorted lists or reverse-sorted lists.


We have different notions of "covered", then. When I teach algorithms and introduce quicksort, the majority of the time is spent discussing strategies for choosing the pivot. I expect none of my students to implement a quicksort with bad pivot selection; if they do, that's my failure as a teacher and definitely failure in "coverage" of the algorithm.

Yeah, it does work as something you learn with/right-before/right-after a sorting algorithm, depending on the teaching style.

It was actually the first algorithm I discovered and learned in a technical environment, when I was debugging my Skyrim mods list and realized I needed an efficient way to discover which of my hundreds of active mods were interacting, causing the dreaded neck-seam issue (It was Ethereal Elven overhaul and another whose name escapes me.)

It's an unusually intuitive algorithm, so it wouldn't surprise me if it were one many people learn first.


They say a little ways down:

> there are classical optimization problems whose values can be approximated in sublinear time

This can actually be quite useful if the approximation is guaranteed, or even if it isn't, as long as it works well in practice.

https://en.wikipedia.org/wiki/Hardness_of_approximation


Here a a few examples, linked in the article:

https://www.dcs.warwick.ac.uk/~czumaj/PUBLICATIONS/DRAFTS/Dr...

Searching in sorted lists is the first example, although they acknowledge that "the assumption that the input array is sorted is not natural in typical applications." They then give an non-trivial variant of this problem, where the sorted list is stored as a linked list, so that you cannot directly jump to some element at position i.

Another example for a sublinear algorithm they give is to check whether 2 convex polygons intersect, using the algorithm by Chazelle and Dobkin.


It is obvious that binary search leverages the property that the input is sorted in order to ignore part of the input, but it is less obvious to see it abstractly as an exotic specimen of sublinear exact algorithm rather than merely as a simple special case of search, and it is even less obvious to investigate what weaker (and hopefully cheaper to guarantee) input constraints allow sublinear search algorithms.

I read that as saying that binary search isn't among those "nontrivial problem"s, along with most other things with known exact deterministic sublinear time algorithms.

And your first quote is followed by "However, for most natural problems", which further indicates that the known exact algorithms are for trivial problems.


Another obvious example: What's the mean of an unsorted list of integers? If you do a random sample of sqrt(n) values and mean over that, you guess is with high probability pretty good. Or a log(n) sample. (That's how election polling works too which uses even a O(1) sample though not random.)

Edit: Ah, GP asked for deterministic exact.


What if one if the integers is significantly larger than the rest?

I mean, yeah, binary search is sublinear, but the data has to be ordered into a binary search tree to be able to use it, which has a much more familiar (non-sub-linear) runtime.

I have to assume the reason for the article wasn't to talk about the runtime of algorithms that operate on data that's already in a partially solved state.


My guess is that the car is looking for a place to pull over (perhaps because the rider pressed the "pull over" button. In this mode, the car hugs the right side of the road until it can find a spot to stop, which would normally work alright, except in this one particular spot where the "right side of the road" is only this tiny island.

So the car just circles around it indefinitely.


It's all of these edge cases that makes self driving cars impractical outside of optimal conditions.


Buridan's principle will tell you that there will always be edge cases.

There will always be room for improvement. The best you can do is to make them vanishingly rare.


That would be the definition of an edge case


Islands are so common in driving, I would expect them to be anticipated or encountered early on in development.


> One recent idea I've had is that many online subscription services should automatically pause if you stop using it.

Cool idea, but probably tough to enforce what “using it” means. I could see companies start sending newsletters to customers and calling that engagement


This wouldn't survive the courts so approximately one company would get away with it for a time.


No serious operator would choose a provider with your implementation of that feature.


I agree that no serious operator would choose a provider whose spending cap feature was "enable this feature and we kick you off the platform".


People will pay to be abused: c.f. Oracle


I wonder about OpenAI's moat. Thanks to advances in hardware and rapidly improving open-source ML frameworks, it's getting significantly easier and cheaper to replicate what they've built over time. That's not the case for most startups: it's not really any easier to build an Uber clone today than it was ten years ago.

OpenAI depends on spending vast amounts of money to stay a year or two ahead of the competition. I'm doubting whether that's a justified tradeoff.


Open source just caught up to GPT-4, which was released over a year ago. You don't think all of the advances in hardware don't also play into their hands? They have GPT-5 in the pipeline and are likely hard at work planning and prepping for GPT-6. A year or two beyond the competition, at this point, is their moat.


> A year or two beyond the competition, at this point, is their moat.

That doesn't seem like much of a moat.

> They have GPT-5 in the pipeline and are likely hard at work planning and prepping for GPT-6.

I wouldn't be surprised if there's an element of diminishing returns here. The improvement from GPT4 to GPT5 is likely much less than GPT3 to GTP4.


And they are going to run into source data problems. The internet is not producing that much more quality content, and now they have a problem with AI-generated clickbait.


I think your comment is emblematic of the very divide between Silicon Valley and the rest of the world that leads to frustration like what the artists here are expressing.

No one becomes an artist except because of passion for the work. It sure as hell isn’t for the money. That’s not so universally true in Silicon Valley, and I’m guessing you’re one of those people who views their job mostly as a means to fund the rest of their lifestyle.


Glad to see this development. The amount of FUD around AVs is too high, and allowing each individual municipality to set their own regulations for AVs would have been a ridiculous amount of red tape for these companies to deal with. Just to pull one particularly bad quote from this article:

> “I hope that, in the meantime, our communities do not suffer too much in terms of injuries and community damages due to the current regulatory gaps,” Cortese said in a statement.

What gaps? What injuries? What community damages? If someone can actually present statistics that these cars are more socially dangerous than an equivalent amount of Ubers and Lyfts, I would be very, very surprised.


It's been nonstop lies from opponents, especially South Bay opponents of this technology with the tiniest kernel of truth that Cruise has not been very responsible. I still can't get over San Mateo County flatly lying about Waymo not talking to them (https://www.cpuc.ca.gov/-/media/cpuc-website/divisions/consu..., "in its protests, San Mateo stated...").


When you strip away some of the more pretentious language, this article isn't really saying anything that extraordinary. If you just swap the phrase "information" with "content", i.e. "addiction to useless content", it becomes clear that the article is really just talking about the dangers of doomscrolling.

At best, it's a useful reminder that doomscrolling happens not only on Instagram and YouTube, but also on "respectable" sites like the NYT, Hacker News, and Wikipedia, and that does resonate with me. But I don't see anything else that the author is really adding to the large corpus of discussion on doomscrolling and Internet addiciton.


It shouldn't take a Vox article to ensure employees basic security over their compensation. The fact that this provision existed at all is exceptionally anti-employee.


OpenAI's terrible, horrible, no good, very bad month only continues to worsen.

It's pretty established now that they had some exceptionally anti-employee provisions in their exit policies to protect their fragile reputation. Sam Altman is bluntly a liar, and his credibility is gone.

Their stance as a pro-artist platform is a joke after the ScarJo fiasco, that clearly illustrates that creative consent was an afterthought. Litigation is assumed, and ScarJo is directly advocating for legislation to prevent this sort of fiasco in the future. Sam Altman's involvement is again evident from his trite "her" tweet.

And then they fired their "superalignment" safety team for good measure. As if to shred any last measure of doubt that this company is somehow more ethical than any other big tech company in their pursuit of AI.

Frankly, at this point, the board should fire Sam Altman again, this time for good. This is not the company that can, or should, usher humanity into the artificial intelligence era.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: