More

init · 2024-04-24T11:35:06

Making their APIs easy to use like they used to be 10 years ago will be equivalent to releasing their core assets. In the past you could do almost anything from the Facebook API that you could do with their web or mobile app.

They release a lot of open source stuff as other commenters have mentioned but you can't build a Facebook or Instagram competitor just by integrating those components.

init · 2024-04-10T19:39:10

I've used both Code Search and Livegrep. No, Livegrep does not even come close to what Code Search can do.

Sourcegraph is the closest thing I know of.

isker · 2024-04-10T19:48:58

Agreed. There are some public building blocks available (e.g. Kythe or meta's Glean) but having something generic that produces the kind of experience you can get on cs.chromium.org seems impossible. You need such bespoke build integration across an entire organization to get there.

Basic text search, as opposed to navigation, is all you'll get from anything out of the box.

init · 2024-04-10T19:58:40

In a past job I built a code search clone on top of Kythe, Zoekt and LSP (for languages that didn't have bazel integration). I got help from another colleague to make the UI based on Monaco. We create a demo that many people loved but we didn't productionize it for a few reasons (it was an unfunded hackathon project and the company was considering another solution when they already had Livegrep)

Producing the Kythe graph from the bazel artifacts was the most expensive part.

Working with Kythe is also not easy as there is no documentation on how to run it at scale.

isker · 2024-04-10T20:05:15

Very cool. I tried to do things with Kythe at $JOB in the past, but gave up because the build (really, the many many independent builds) precluded any really useful integration.

I did end up making a nice UI for vanilla Zoekt, as I mentioned elsewhere: https://github.com/isker/neogrok.

birktj · 2024-04-10T21:03:53

I see most replies here ar mentioning the the build integration is what is mainly missing in the public tools. I wonder if nix and nixpkgs could be used here? Nix is a language agnostic build-system and with nixpkgs it has a build instructions for a massive amount of packages. Artifacts for all packages are also available via hydra.

Nix should also have enough context so that for any project it can get the source code of all dependencies and (optionally) all build-time dependencies.

jeffbee · 2024-04-10T21:29:30

Build integration is not the main thing that is missing between Livegrep and Code Search. The main thing that is missing is the semantic index. Kythe knows the difference between this::fn(int) and this::fn(double) and that::fn(double) and so on. So you can find all the callers of the nullary constructor of some class, without false positives of the callers of the copy constructor or the move constructor. Livegrep simply doesn't have that ability at all. Livegrep is what it says it is on the box: grep.

humanrebar · 2024-04-10T22:28:47

The build system coherence provided by a monorepo with a single build system is what makes you understand this::fn(double) as a single thing. Otherwise, you will get N different mostly compatible but subtly different flavors of entities depending on the build flavor, combinations of versioned dependencies, and other things.

jeffbee · 2024-04-10T22:31:01

Sure. Also, if you eat a bunch of glass, you will get a stomach ache. I have no idea why anyone uses a polyrepo.

humanrebar · 2024-04-10T22:33:03

The problem with monorepos is that they're so great that everyone has a few.

refulgentis · 2024-04-10T23:53:08

God that is good.

yencabulator · 2024-04-12T16:53:50

Nix builds suck for development because there is no incrementality there. Any source file changes in any way, and your typical nix flake will rebuild the project from scratch. At best, you get to reuse builds of dependencies.

tayo42 · 2024-04-10T19:53:55

Is there like a summary of what's missing from public attempts and what makes it so much better?

sdesol · 2024-04-10T20:38:15

The short answer is context. The reason why Google's internal code search is so good, is it is tied into their build system. This means, when you search, you know exactly what files to consider. Without context, you are making an educated guess, with regards to what files to consider.

riku_iki · 2024-04-10T21:01:37

How exactly integration with build system helps Google? Maybe you could give specific example?..

isker · 2024-04-10T21:24:54

Try clicking around https://source.chromium.org/chromium/chromium/src, which is built with Kythe (I believe, or perhaps it's using something internal to Google that Kythe is the open source version of).

By hooking into C++ compilation, Kythe is giving you things like _macro-aware_ navigation. Instead of trying to process raw source text off to the side, it's using the same data the compiler used to compile the code in the first place. So things like cross-references are "perfect", with no false positives in the results: Kythe knows the difference between two symbols in two different source files with the same name, whereas a search engine naively indexing source text, or even something with limited semantic knowledge like tree sitter, cannot perfectly make the distinction.

dmoy · 2024-04-11T21:51:43

Yes the clicking around on semantic links on source.chromoum.org is served off of an index built by the Kythe team at Google.

The internal Kythe has some interesting bits (mostly around scaling) that aren't open sourced, but it's probably doable to run something on chromium scale without too much of that.

The grep/search box up top is a different index, maintained by a different team.

sdesol · 2024-04-10T21:17:10

If you want to build a product with a build system, you need to tell it what source to include. With this information, you know what files to consider and if you are dealing with a statically typed language like C or C++, you have build artifacts that can tell you where the implementation was defined. All of this, takes the guess work out of answering questions like "What foo() implentation was used".

If all you know are repo branches, the best you can do is return matches from different repo branches with the hopes that one of them is right.

Edit: I should also add that with a build system, you know what version of a file to use.

j2kun · 2024-04-10T20:43:37

Google builds all the code in its momnorepo continuously, and the built artifacts are available for the search. Open source tools are never going to incur the cost of actually building all the code it indexes.

DannyBee · 2024-04-11T01:51:19

The short summary is: It's a suite of stuff that someone actually thought about making work together well, instead of a random assortment of pieces that, with tons of work, might be able to be cobbled together into a working system.

All the answers about the technical details or better/worseness mostly miss the point entirely - the public stuff doesn't work as well because it's 1000 providers who produce 1000 pieces that trade integration flexibility for product coherence. On purpose mind you, because it's hard to survive in business (or attract open source users if that's your thing) otherwise.

If you are trying to do something like make "code review" and "code search" work together well, it's a lot easier to build a coherent, easy to use system that feels good to a user if you are trying to make two things total work together, and the product management directly talks to each other.

Most open source doesn't have product management to begin with, and the corporate stuff often does but that's just one provider.

They also have a matrix of, generously, 10-20 tools with meaningful marketshare they might need to try to work with.

So if you are a code search provider are trying to make a code search tool integrate well with any of the top 20 code review tools, well, good luck.

Sometimes people come along and do a good enough job abstracting a problem that you can make this work (LSP is a good example), but it's pretty rare.

Now try it with "discover, search, edit, build, test, release, deploy, debug", etc. Once you are talking about 10x10x10x10x10x10x10x10 combinations of possible tools, with nobody who gets to decide which combinations are the well lit path, ...

Also, when you work somewhere like Google or Amazon, it's not just that someone made those specific things work really well together, but often, they have both data and insight into where you get stuck overall in the dev process and why (so they can fix it).

At a place like Google, I can actually tell you all the paths that people take when trying to achieve a journey. So that means I know all the loops (counts, times, etc) through development tools that start with something like "user opens their editor". Whether that's "open editor, make change, build, test, review, submit" or "open editor, make change, go to lunch", or "open editor, go look at docs, go back to editor, go back to docs, etc".

So i have real answers to something like "how often do people start in their IDE, discover they can't figure out how to do X, leave the IDE to go find the answer, not find it, give up, and go to lunch". I can tell you what the top X where that happens is, and how much time is or is not wasted through this path, etc.

Just as an example. I can then use all of this to improve the tooling so users can get more done.

You will not find this in most public tooling, and to the degree telemetry exists that you could generate for your own use, nobody thinks about how all that telemetry works together.

Now, mind you, all the above is meant as an explanation - i'm trying to explain why the public attempts don't end up as "good". But myself, good/bad is all about what you value.

Most tradeoffs here were deliberate.

But they are tradeoffs.

Some people value the flexibility more than coherence. or whatever. I'm not gonna judge them, but I can explain why you can't have it all :)

init · 2023-10-30T21:04:56

Henry Kissinger has a heavy German accent that didn't prevent him from becoming one of the most influential American politicians of the 20th century.

fsckboy · 2023-10-30T21:52:43

Henry Kissinger was also known as a ladies man. Once he was in his hotel room with a pretty woman when a world crisis broke out that required his attention. However, he was not answering the phone, so a desk clerk was sent up to the room. He knocked on the door and said "Mr Kissinger, I have a message for you". From behind the door he heard, "Go avey!" but it was important so he knocked again and said "Mr Kissinger, it is urgent that I speak to you!" and again "Go avey!" so for the third time he said "It is urgent, are you Kissinger!?" and the reply "No! I'm fuckingher! Now go avey!"

My gf's mother told me that joke back in the day, with a very heavy South American accent, but it still worked, maybe a little better because she said "Kissin-gher".

I saw Henry himself just a few years ago, right before Covid, in a NYC restaurant. He's extremely old, but he seemed very together.

You might call him a "statesman", but he wasn't precisely a politician. Also, the post WWII/Cold War era opened the door, so to speak, to a large number of displaced Europeans, scholars, to give advice about East and Middle European issues, advanced science, etc. Zbigniew Brzezinski, Werner von Braun also come to mind.

init · 2023-07-12T11:38:17

I worked on something like this more than a decade ago back when bayesian classifiers were SOTA for sentiment analysis. It's relatively easy to use an existing model to bootstrap and fine tune a new model. The hardest part is collecting and cleaning up the data for training.

PaulHoule · 2023-07-13T01:23:48

That’s exactly it, the thought of finding and validating 5000 angry toots makes me feel sick!

init · 2023-07-10T12:39:25

Unfortunately this has already been happening for a decade or more since the NATO "intervention" in Libya and it has little to do with climate change.

csomar · 2023-07-10T12:43:45

The NATO intervention has nothing to do with this. The previous dictator created the threat to dissuade the EU from removing him. Now that he's out of the picture, the inevitable happened.

A good lesson is to stop propping dictators in exchange for short-term stability. Based on the current events, the EU did not learn that lesson.

atlantic · 2023-07-10T13:47:44

No, the lesson is to stop interfering in other countries' internal affairs. The so-called refugees all originate from countries that suffered western military operations. "Humanitarian" intervention is the new colonialism.

sharikous · 2023-07-10T14:01:46

Pakistan did not suffer "Western military operations", at least not recently and at a big scale, and half of the refugees in the ship were probably Pakistani.

"Humanitarian" intervention has always been an excuse to conquer, at least from the times of the Persian Empire or so.

And there have been refugee waves wherever there had been war and famine - that is all over the globe, with nothing explicitly indicating the "western interventions" are outsiders

atlantic · 2023-07-10T14:55:25

The vast majority of refugees crossing the Mediterranean are from countries torn apart by western-instigated civil unrest or war: Syria, Iraq, Afghanistan, Lybia, and various north African countries subject to "color revolutions". This isn't even remotely controversial. Pakistanis, Indians, Nepalese, generally fly in with tourist or student visas, and then try to find work.

sharikous · 2023-07-10T16:20:35

Read the article, in that specific ship it is believed that half were from Pakistan

atlantic · 2023-07-10T16:28:46

Even if this particular ship is an outlier, the broader point stands regarding the make-up of refugee populations.

And concerning Pakistan, former PM Imran Khan stated his fall from power was caused by US interference. That counts as major.

init · on Dec 16, 2022

I'm interested in this space and other mining-adjacent industries in the DRC-Zambia copper belt region due to the potential of this region to be a key player in central Africa's development and the world's move into electric vehicles.

The main challenge is getting capital.

Let's get in touch!

init · on Sept 3, 2022

The CMU Database Groups YouTube channel has a very good material on this. I highly recommend the Intro to Database Systems videos: https://www.youtube.com/c/CMUDatabaseGroup/playlists

init · on May 21, 2022

Thank you for sharing this link. The Internet Archive is truly a marvel of unequaled value. Didn't know it had such a trove of warez without the risk of pwnage that warez and torrent sites came with.

manjana · on May 21, 2022

How do you know these are free of any malware?

bsagdiyev · on May 21, 2022

For Windows ones the SHA sums are available online to compare to.

boomboomsubban · on May 22, 2022

You can make SHA sums of malware.

schroeding · on May 22, 2022

If the ISOs are untouched (so it won't work for the posted "Delta Edition"), you can search the SHA hash of the version from MSDN. Relevant search keywords are "Microsoft SHA1 Hash Archive" :D

init · on May 17, 2022

Software is already having a big impact. Just think of all the tools that allowed us to work from home and not commute during the pandemic and how different the world would have been if we did not have them.

A area where software will have a huge impact is crypto mining. As a community we should be proactive in reducing the amount of energy wasted on crypto mining. If you think about it, only the rich have the ability to mine lots of crypto coins and these are people who could have used the same investment to do more positive things in their communities like donating computers to local schools or let the electric power they consume be used for more meaningful purposes instead of buying yet 10 more GPUs and plugging them into the grid. NVIDIA was on the right track with their firmware changes to slow down mining.

init · on May 17, 2022

Maybe the better question to ask is "why isn't file reversion control adopted more outside of software engineer?"

Git a is distributed version control system that most people shouldn't have to deal with. It has awful tools and doesn't work well with binary files, large files and moved files. Just an example: How do you diff two copies of an excel document with Git?

Most people don't have to deal with multiple "master" branches with no single source of truth because that's really what Git is for.

What most people need, including software engineers working in closed source, is a single source of truth with good history, diff and large files support.

I believe a tool like Dropbox or even centralized version control systems like Subversion or Perforce are better positioned to solve this than Git.

Many cloud based tools already have features that give them the upper hand over git. For example, Google docs allows collaborative editing and editors have access to file history and can revert to specific versions.

Maybe there are tools to diff binary files like two versions of an audio file or two excel documents or whatever two [domain specific file format] documents.

I will be happy if version control came at the file system or cloud drive level and the app just leveraged this integration seamlessly instead of forcing everyone to learn the difference between between branching, rebasing, cloning, copying, stashing, etc...

loa_in_ · on May 17, 2022

Binary diffs are not useful, unless the diff software understands the underlying format. Most formats are linear but many are not, and many formats permit different binary encodings of equivalent data. Imagine for example archives created from same directory tree but by different implementations. There's no reason to develop this feature in git, because that's a lot of maintenance for little to no use. I don't see why spreadsheets should be any different. If they contain plain text that should be diffable you can remove non printable characters and pipe it to diff.