More than 0$, so definitely more than they're willing to lose in a Mexican court case that costs money and reputation, especially given the local courts are unlikely to rule in their favour.
That's the correct answer, IBM wanted the crawler mostly to feed Watson. Building a full search engine (crawler, indexer, ranker, API, web application) for the English language was a hell of an accomplishment but by the time Blekko was acquired Google was paying out tens of billions of dollars to people to send them and only them their search queries. For a service that nominally has to live on advertising revenue getting humans to use it was the only way to be net profitable, and you can't spend billions buying traffic and hope to make it back on advertising as the #3 search engine in the English speaking markets.
There are other ways to monetize search (look at Kagi for example) than advertising. Blekko missed that window though. (too early, Google needed to get a crappy as it is today to make the value of a spam free search engine desirable)
Blekko was gone by the time I learned about it. Recently (past few years) I emailed someone who worked on Blekko to get his opinion on a search engine concept I still have yet to start. His advice was to not bother competing with Google (obviously) LOL!
I don’t know if anyone’s embarked on a P2P search engine but that’s essentially my concept. Anyhoo, thanks for the inspiration!
Peer to peer would be tough, you really need a 10G network connection to some tier 1 provider, and about 2500 machines to distribute the crawling/serving load. (that is if you want to do a full stack search engine). And while you can run that infrastructure for on the order of $100K/month (not counting depreciation) that means you need roughly $5K/day in revenue from that cluster. At $10 RPM ($10 revenue per thousand queries) you're looking at a minimum of 500,000 'real' search queries during 'English time' (roughly 7AM to 11PM GMT). That's 31,250 queries per hour or ~9 queries per second (average).
And that just pays to keep the lights on at the colocation center. If you're paying off the development costs (30 - 50 developers over 2 - 3 years) and the cost of an office somewhere. You'll want at least double that revenue or you'll go broke before you break even.
Ideally you are the 'go to' place for people looking to buy something as those queries make money. People researching Douglas Fairbanks for a high school essay consume queries but don't generate ad revenue.
I know "search is hard" in the general sense but context is lacking (not a lot of details online from ex-search teams). It's always been apparent to me that you must have some other high-grossing product if you want to get into search or video, if only to pay for the servers.
Darknet Lantern is a decentralized searchable directory. It's probably not going to take off, but it could inspire something else. Servers spider other servers with the same software, and synchronized their data.
Yup, directory services are a lot easier to do peer-to-peer. Pinboard.in is a good shared directory (sort of Yahoo! without the editorial). They can yield excellent quality when you're searching for something that someone has 'indexed' with them, but poor recall when it comes to the set of all possible answers.
Doing it peer to peer without editorial allows sites to 'get into' the index easily which has its own plusses and minuses.
Instead of each node storing the next- and previous-pointers separately, store a single pointer which is the XOR of the two. Which is obviously an invalid pointer. But when iterating, XOR the previous node's pointer with the combined pointer to get the next node's pointer, and so on. You can iterate this way in both directions. Feels illegal. :)
One thing this loses relatively to a conventional doubly linked list is the ability to remove an item while only having its address (or any other iterator pointing to it that is stable across other items being inserted or deleted). And unfortunately that’s often why you’re using a doubly linked list in the first place.
(A less intrinsic flaw is that coding an XOR linked list in strictly conformant C is exceedingly annoying. The standard does not guarantee that equal pointers turn into equal integers upon casting, so you’re forced to make everything into an uintptr_t to, essentially, maintain a canonical integer-cast version.)
> The standard does not guarantee that equal pointers turn into equal integers upon casting, so you’re forced to make everything into an uintptr_t to, essentially, maintain a canonical integer-cast version.
Of course! The standard does not guarantee that the size of an int is the same as the size of a pointer, i.e. `sizeof(int) =/= sizeof(int*)`. IIRC, this was the case on some of the PowerPCs I worked on decades ago. Now with x86-64 & Aarch64 having taken over the world (and with saner 32bit on the embedded end) we've almost forgotten the Cambrian explosion of "interesting" word-size machines.
The whole point of the C standard is that it allows implementers flexibility in the sizing of the basic data types, to match a given machine's architecture, with the standard only defining relationships between types (e.g. char <= short <= int <= long). The only surprise is that it took so long for fixed-width types to be standardized (C99).
I meant integers in general, not ints specifically. That is to say, the standard does not guarantee that, if p and q are (say) void pointers and p == q, then (uintptr_t)p == (uintptr_t)q. (Neither does it guarantee that, if p == NULL, then (uintptr_t)p == 0, but I digress.) Mere size shenanigans are not enough for things to get that bad. What you need is for there to be multiple ways to represent a pointer to the same place in memory, like on real-mode x86 in the large memory model.
The practical result is, you need to write XOR-linked-list operations something like this:
and you cannot for example make head, the sentinel node pointer, into a struct entry * instead, it has to be exposed as an uintptr_t (a canonical integer representation of that pointer).
Wow, TIL. So in the version unaware of this the list could be corrupted if the pointer you use, while correctly pointing to the head or tail of the list, happens to convert to a different bit representation for the XOR than the one you encoded into the node. Did you happen to ever see this in a real system?
I don’t have it in me right now to work out whether this can actually cause corruption, but as for equal pointers simply converting to unequal integers, sure:
$ cat mismatch.c
#include <stdio.h>
char abc[48*1024U], def[48*1024U];
int main(void) {
void *p = &abc[sizeof abc], *q = &def[0];
printf("%d\n", p == q); /* pointers are equal */
printf("0x%.5lX == 0x%.5lX\n", /* linear addresses are equal */
((unsigned long)p >> 16 << 4) + (unsigned short)p,
((unsigned long)q >> 16 << 4) + (unsigned short)q);
printf("0x%.8lX != 0x%.8lX\n", (unsigned long)p, (unsigned long)q);
return 0;
}
$ # compile&link for DOS, 8086, huge memory model, create map file
$ wcl -bcl=dos -0 -mh -fm mismatch.c
Open Watcom C/C++16 Compile and Link Utility Version 1.9
[snip]
creating a DOS executable
$ egrep '_abc|_def' mismatch.map
0371:0000+ _abc
0f71:0000+ _def
$ emu2 mismatch.exe
1
0x10080 == 0x10080
0x0408C000 != 0x10080000
(I said large memory model earlier. That was incorrect: if you compile for the large memory model, with -ml, the first line of the output will be 0, because then pointer comparisons will not canonicalize pointers. You need the huge memory model for that. Both 0 and 1 are OK according to the standard, as it does not guarantee anything about comparing a pointer one element past the end of one object to a pointer to another object.)
On modern 64-bit machines, sizeof(int) != sizeof(int*) is very true. But there's probably a significant amount of code that assumes it's equal to sizeof(uint64_t) or sizeof(long). :)
True, but if you always keep a pair of adjacent pointers where before you would have kept just the first, efficient deletion and insertion come back. (You could even go the whole hog and encapsulate this pair as a new WidePointer type, with operator++() and operator--() internally operating on both, etc.)
This will likely be a small constant factor slower for some operations, of course, but if you need bidi traversal and node values are small compared to pointers, it's a solid space savings -- and the lower cache utilisation resulting from that savings may even make it a speed win overall.
Storage can be further reduced if we think that, with a 64-bit processor, probably a 32-bit address space is enough for most applications (that require less than 4 GB of RAM).
Maybe we can go even deeper with 16-bit near/relative pointers. Perhaps data-oriented design fits well in this situation? With blocks of 64k elements and uint16 indices to address elements inside of them.
> with a 64-bit processor, probably a 32-bit address space is enough for most applications
This is related to the Java virtual machine's use of 32-bit "compressed ordinary object pointers (OOPs)" on a 64-bit platform. The pointer is 8-byte-aligned though, so it can address 32 GiB of memory. There is also a non-compressed-OOPs mode that can address more than 32 GiB.
I see that you're being silly, but the problem is conflating pointers and indices. 16 bit indices are fine. 16 bit pointers are terrible.
Separately, 32 bit pointers are a good optimization in many situations. Java will save lots of space by using 32 bit pointers until you go above 32GB or 64GB or more (depending on what you set as minimum object alignment).
A relative pointer with limited range is something that should rarely exist, and all too easily it can end up with a base that is no longer guaranteed to be adjacent. There's a lot of bonus landmines compared to just having a 64k element limit.
By "landmine" I mean situations where it gets very unpleasant to use because the data isn't guaranteed to be close enough to fit into one of those pointers. Even if it's true when the code is first written, minor updates might change that. I don't think something like a borrow checker can do much to help with that.
Let me put it this way: 16 bit indices are already rarely worth it. 16 bit pointers are 10x the trouble for 20% more situational coverage. It's not worth it. They are not similar levels of useful/terrible.
Indices all go in the same allocation, which makes a big difference. You're just putting a max size on your arrays, instead of making your memory allocations a lot more complicated. And if you need bigger, you'll know in a much more straightforward and deterministic way. It also doesn't require the same support from the language, whether that's explicit small pointer support or support for turning integers into pointers.
So essentially, you are using indices rather than pointers.
Indices have advantages: they can be more compact, and they are position-independent. But you need a base pointer, and a bit more computation. It also requires you to have your own allocator, as your typical heap allocator will just give you an arbitrary pointer to your object.
Re: data-oriented data structures... random (not XOR)...
I have been running multiple iterations of entire ipv4 space port scan. As of now only scanning top 25 used ports (nmap, masscan have toplists). I wanted to be able to do simple data analysis _very_ efficiently time-wise, being OK to sacrifice memory for that.
If you want (time-wise) O(1) item insert (into SORTED btw), fetch, lookup, and port status check to simply be a matter of bitshifting (similarly with counting), then:
1. Bitfield array to store status of up to 32 ports (1 bit for state (open/closed) => 32 bit bitfield)
2. ...that's it. Each scan result is to be found at
`bitfields[(unsigned int) ipv4_address]`
In C:
```
// bitfield for port status, for each IP
struct port_field {
bool p1:1;
bool p2:1;
bool p3:1;
bool p4:1;
bool p5:1;
// in C, gotta write it out - of course we could use a macro to generate this
...
bool p32:1;
};
```
This will use 16 GiB of memory for the whole mapped space:
in_addr_t u32_ip; // unsigned 32 bit int
struct port_field *p_target_bitfield;
int *p_target_int;
// ... to insert:
if (!(u32_ip = inet_addr(token))) { // `token` is string with actual ip (e.g. from text file)
printf("line %lu: IPv4 address not valid: %s\n", line_count, s_ip);
} else {
p_target_bitfield = &(ip_space[u32_ip]); // unsigned int ipv4 as 'index'
p_target_int = (int *) ((void *) p_target_bitfield); // cast bitfield* to int\*
// set bit at port_index:
*p_target_int = ((1 << port_index) | *p_target_int);
// now, port identified by `port_index` is at `(1 << port_index) | *p_target_int)`
// where `p_target_int` is pointer to port status bitfield cast into signed int32
```
It works - pretty nifty :) i'm sure i could make it much more pretty tho.
But a kind of 'columnar-bitfieldish' in-memory O(1) for everything:)*
Hm I guess you're right - I'm misusing the term. And yeah! Will experiment with it; what's neat is that it's not that much code to mess around with these basic notions...
Re: 224 << 24 - you're right; so many unusable actual addresses. It's just kind of neat to actually map out whole ipv4 space to memory. But yes lots of it unneeded, I'll see if I can add minimum-computation-possible mapping translation so that everything still stays ~kind of O(1).
Thank you for your comments! :)
edit P.S. 25 x 512MiB arrays - actually thank you, I thought of doing sth like that at first, but now forget why didn't start experimenting with that sort-of-actual-columnar-store from the beginning.. anyway, nice to quickly mess around with multiple base data layouts (I'll try that one next I think), would recommend anyone wanting to attain base knowledge on e.g. data layouts for data analysis...
> Instead of each node storing the next- and previous-pointers separately, store a single pointer which is the XOR of the two. Which is obviously an invalid pointer. But when iterating, XOR the previous node's pointer with the combined pointer to get the next node's pointer, and so on. You can iterate this way in both directions. Feels illegal. :)
Meh, it's not a pointer, it's not really any different from storing the difference between the two pointers, which obviously would let you iterate in both directions.
Yes, but as mentioned in TFA, storing the difference would require an extra bit. The difference between two 32 bit numbers is in the range [-2^32 -1, 2^32-1], needing 33 bits to store. The XOR is the same size as the original pointer, simplifying data flow.
But even so, storing a doubly linked list with only pointer differences and no absolute pointers (other than head and tail) feels illegal too
> storing the difference [of prev and next pointers] would require an extra [sign] bit [relative to XOR]
No it wouldn’t. Just let wraparound happen normally and things will work out.
Effectively what you need are functions
pair : ptr × ptr → uintptr
left : uintptr × ptr → ptr
right : ptr × uintptr → ptr
left(pair(x, y), y) ≡ x
right(x, pair(x, y)) ≡ y
Setting all three to XOR is one possibility. But
pair(x, y) = (x + y) mod 2^(width of ptr)
left(p, y) = (p - y) mod 2^(width of ptr)
right(x, p) = (p - x) mod 2^(width of ptr)
is an equally valid one, because addition and subtraction modulo any fixed value are related in exactly the same way as normal addition and subtraction.
Uh, no. Assuming 32-bit pointers, you'd add or subtract them using normal unsigned arithmetic which is modulo 2^32. Essentially the same trick definitely works because all these operations (adding/subtracting/xoring a number) are invertible.
The neat thing about xoring with a number is simply that that operation is its own inverse.
One difference between XOR and difference is that because of the symmetry of XOR, you could use the same code for walking the list forwards and backwards.
Run a memtest. Graphics cards usually crash badly when given invalid data, which can happen sporadically if you have bad RAM.
If memtest shows a specific memory region as failing, swap out sticks to check which it is, and buy a new one. (Or if you're on a tight budget, you can disable that region with kernel boot options.)
If memtest gives errors in lots of places, might be a bad overclock. Loosen timings or give it more voltage.
> If memtest shows a specific memory region as failing, swap out sticks to check which it is, and buy a new one. (Or if you're on a tight budget, you can disable that region with kernel boot options.)
Honestly, I'd always do that as the first option. In most cases you can still get years of life out of that stick of ram (but do check regularly, like after a week and then double the interval every time if it didn't get worse).
If you kick someone out, you lose them as a customer, and they'll tell all their friends about the free play trick out of spite, so you'll have to patch the machine anyway.
You're making me wonder what the stats are for how many people try to abuse arcade machines in a country like Japan versus the United States. (Not that people in any country are gonna be entirely honest, but the entitlement to break the system and the comfort to brag about it seems cultural.)
In fact, that could be why some of the machines weren't better protected against that stuff in the first place, right?
There are some great scenes in Rebels of the Neon God [1992] by Tsai Ming-Liang (Taiwanese filmmaker) where the main characters steal the main pcbs from some arcade machines and try to resell them to the arcade owner lol. Wonderful film, recommend it - some great scenes in those arcades.
I know little about it and I'm not sure that it works. It might.
I do know that for regular std-using programs, std has its own panic strategy that it was compiled with, so even if you set your own code's panic strategy to abort, it still ends up pulling in unwind-related code from libstd. The solution to that is to also recompile libstd with the same parameters as your own code, which is what `-Z build-std` does.
I don't know if the same thing applies even to libcore, ie for no_std programs. FWIW, coincidentally I spent yesterday making a no_std panic=abort program for a custom freestanding target, and the only missing symbols it needed from me were memcpy and memcmp, even though the code does do panicking things like slice indexing. That program requires `-Z build-std=core` since it's for a custom freestanding target.
Has that really changed recently? Different people have always enjoyed different things, and some don't like puzzles. That was as true a hundred years ago as now.
Yes. But modern games also evolved to appeal to people who don't like to be challenged and look for a casual rewarding experience.
It becomes quite obvious when you play a game of the early 90s and compare it to gameplay of today.
If you couldn't identify(!) and solve the puzzle in a LucasArts Adventure Game, you were stuck, if you repeatedly died in Super Mario / Sonic the Hedgehog you had to restart from Level 1.
No need to feel sorry. I never had a NES, I was actually thinking of Super Mario Land on GameBoy when I wrote this, but ultimately wrote it a bit more generic...
I do however have a vivid memory of LucasArts' Indiana Jones and the Last Crusade, where I got stuck in a cave under a Venice Cafe for days, because I didn't know that I need to talk a guest out of his Wine-bottle using specific sentences, and use the wine to soften the mud of a lever in a wall...
But I guess that's too specific for anyone to relate to...
I don't know about "recently", but things have certainly changed.
Even when there was information available in the past, it wasn't always easy to access or understand, and more critically, the willingness and pace of shared knowledge has increased tremendously.
If we look back, admittedly through the lens of nostalgia, at games like WoW:BC, then strategies took weeks or even months to bed in and disseminate from the top guilds to the wider public. RPGs, even in the gamefaqs era, took weeks or months to feel "solved".
Whereas now games are released and are often "solved" in beta before their official release. Game knowledge is spread so much faster now.
Even when we had early youtube back then, and we had IRC and gamefaqs, the knowledge just wasn't spread as fast. It's hard to know if that's just the sheer numbers of people involved now, or a greater willingness to share that knowledge.
Yes and No,
i think modern games lead more people to be like this. Because quest markers and gps navigations and hints and creative/story modes. Also a heavy emphasis on the rewards rather than the quest.
I notice this myself when going back to older games like Final Fantasy 7 or 8 and misremembering how complex and complicated the UI, Menus and systems are.
You just get used to these things, but i also think they are a bit immersion breaking and shift the focus a lot on the destination and not the journey.
Some game devs don't even want to risk the chance that players might have to stop and explore a little bit before they find the right path to continue the game.
That's not the fear. The fear is that the player will get stuck and simply stop playing because it's frustrating instead of fun.
The way that devs determine where the line is for "most people that play get frustrated and give up" and "most people figure it out and keep playing"
It's certainly possible to design a game where the intuition of the player matches a less contrived looking world, bit it's a lot harder. On the other hand, having a convention that climbable surfaces will be blazed yellow, and teaching it early in the game is comparatively easy.
There's also aspects of pacing and focus to consider. A game that ebbs and flows in it's intensity often feels better to play than one that's high intensity all the time. So if your game is mostly about combat, you'll want to break up big fights with lower energy experiences. But those lower energy experiences really aren't "the game" in a sense, they're a kind of filler that makes the high energy experiences feel more fun. It can't be completely boring, but it should be easier than the primary gameplay experience and also have lower stakes. That's a big reason why do many actiony games have blatantly obvious climbing and puzzles. It keeps the player lightly engaged while letting them catch their breath before the next set piece.
It's not the only way to build games by any means, but it is a generally effective, and consistently reliable, template.
But this is exactly one of the problems of today. Everything uses UX metrics to base their design around. It makes the end result boring and predictable and the opposite of immersive.
I agree that there needs to be pacing, but there are great ways to do that.
GTA and RDR are great examples. You drive somewhere and get a funny conversation, or something happens on the way to the objective that pretty naturally distracts you in a way that makes sense to the world the game is playing in.
Ubisoft games are basically full of these boring filler activities. What point does a huge open world have, if it's just the same 5 things copy and pasted all over the place?
I'd much rather condense those 590 busy work tasks to 5 really nice side quests like in The Witcher 3. At least they give you an experience and not some UX concept of "the user needs to go climb a tree now".
it's a very interesting divide when you have games that worry about this, and then genres/subgenres like the "soulslike" that take pride in its difficulty and precise maneuvering involved just to beat a boss.
To clarify how flagging works on HN: Flagging only hides a comment once "enough" users flag it, and even then remains visible to users with the showdead option set in their profile.
I think. I know N>1 because I sometimes flag something and it is not suppressed right away. Often though I flag something and then it is suppressed a few minutes later so I believe N=2 or N=3 or something small like that.
For that matter a lot of posts get suppressed right away, particularly from people who visit HN and immediately start posting stories from the same blog over and over again.
I think you can estimate the threshold by just counting the ratio of times you flag something and it immediately dies. You would wait a while to see which items die or do not die, and discard the subset which never die, and consider only the subset of items that transitioned to [flagged][dead].
If the [flagged] logic is simply "flags >= N", then of the subset of samples that are flag-killed with your involvement, you will be final flag in 1 of N of those.
The null hypothesis is "this experiment converges to a reciprocal integer".
I feel like this requires an assumption that any post you flag is equally likely to be flagged by other users, and at a similar time as you, in other words that your flagging behavior is very similar to others that use this feature.
I guess this might give you a pretty nice upper estimate though. Unless N is very high / very few users use the feature, and you are frequently much later to the scene as the average user of the feature. Then N may be underestimated.
Hacker News is biased so that the threshold for flagging is very low, in order to maximize the effective separation of signal from noise as much as possible (for as broad a definition of "noise" as possible.) If I had to guess it's probably 3, or it's tied to user karma (flags from higher karma accounts count more.)
And certain topics are likely to be flagged by multiple people, this can be assumed based on the topics that people complain aren't "HN worthy." Anything that can be considered politics, for instance, is probably going to get flagged because of the number of people who don't believe political stories of any kind have a place here.
That's probably a decent approximation, but it assumes the distribution of latencies for flagging is the same for you and the aggregate "everyone else."
I think so. It depends however on how fast I think I am compared to the other flaggers. If I was really, really fast I could always be the first flagger.
Parody is fair use, and this is part of a submission to SIGBOVIK 2024. http://tom7.org/bovex/ It's also clearly not competing with the original, since its contents are partially nonsense.
> Maybe I’m old, but traditionally monitors where 4:3 until mid to late 2000s when they shifted to 16:10 then 16:9.
They were, but I recall 4:3 monitors lacking the width to comfortably accommodate things like IDE project views AND code simultaneously.
I believe 16:10 was chose as a nice aspect ratio for productivity, but 16:9 was chosen for movies/TV, and economies of scale and cost reduction impulses meant they took over even in areas where they were a stupidly poor choice (observe all the laptops with massively fat bezels [1]).
Right on my cusp, I still remember salivating for one of those straight from the Jetsons 4:3 LCDs (1997? 9? Only 9/11, so I may have been seeing them late / out of context).
First 16:9 was probably 2006...got the first ever MacBook Pro as my graduation gift
It informed some conception I had of the iPad as a platonic ideal (first 4:3 monitor in years)
reply