Hacker News new | past | comments | ask | show | jobs | submit | shnkr's comments login

>How do you dispute what prior employers said about you?

Are they not doing that already? Even with all the current processes, employers are taking a chance. If a candidate profile is good, can't they take the same chance without all those processes?


Recruiters don't call up a business and just ask "how was X at their job?" They are given a list of referrals by the candidate. I am talking about situations where a toxic boss may say negative things for personal reasons. Currently, the candidate can filter out these by choosing people they assume will give positive reviews.

NO certificate or authority, I'm a nobody.

Take LinkedIn as example, and the messages from recruiters inviting for an interview. Why can't they get the profile checked by hiring managers and send out an offer directly? Once the candidate accepts it, collect the documents and run background checks and deny if there is discrepancy.


I simply think you're going to have a hard time skipping technical interviews. Every employer will want to see what you're capable of; if they have two candidates and one of them turns down a tech interview, the other candidate is almost certainly getting the job.

Whatever may be the end goal, MITM is called an 'attack', not 'research'.

I'd not last a single day at such a company who would ask me to do such things. I had worked for a national political party in IT and left the job once I found about it corrupt practices and scams.

If we, as engineers collectively upheld ethics as part of work culture, Meta wouldn't have attempted it.


As an ethical engineer, there is a further duty to also sabotage the organization once we uncover dirt on it. Never for profit. Sometimes for ego. And always because if every engineer took a stand against BS, then the world would be a much better place.


> as engineers collectively upheld ethics as part of work culture

Just saying, it's really hard when your job or even your future green card is on the line. When the grunt engineers are 1 mistake away from being sent away from the US and lose all their potential futures in the US, they are much more likely to bury their heads carry out what they are told from the managers.

We need to go for the higher ups more.


someone committing fraud for money is same as committing fraud to keep a visa.


> MITM is called an 'attack', not 'research'

Sorry but what?


>The relationship with commercial vendors isn’t always healthy, but many major OSS projects are supported to a significant extent.

Almost always the so called "community" supporting a OSS project is an employee of a commercial vendor who is only interested as long as he is assigned to the project or task.

The solution is to have a full time owners and maintainers for all the critical projects and the government has to foot the bill. The govt can setup a division to identify such projects.


Government: launches a years-long covert operation to take over maintainership of critical project in order to insert a backdoor.

HN comments: the solution is for government to maintain these critical projects.


That's a likelihood it would seem.


I mean, getting an actual government agency with an appropriate mission specified by law _would_ help. Both from a recruiting point of view (you get sufficiently ideologically motivated people), but also from an accountability point of view. These agencies are ultimately responsible to someone. And the law has that nice property of knowing who and how to hurt those people. So yeah. Getting a (or the) government to maintain OSS infrastructure definitely would help. And probably also prevent this kind of thing as far, far too risky to attempt


I'm amazed we have gotten this far without something like that happening. Critical infastructure is built ontop of this pile of software that is all being maintained by. If every major piece of infastructure (power plant, water treatment plant, etc) would dedicate 1 full time engineer to 1 open source dependency that they use, there would be more than enough man power to solve it.


We can't even support actual critical physical infrastructure anymore, like roads, bridges, and the power grid. And that stuff has very obvious immediate consequences when it breaks. Try explaining to your local octogenarian senator what xz is and why OpenSSH shouldn't just be funded by whatever spare change we find in the couch cushions.


Governments will just outsource it to commercial contractors at this point.


honest question - why was there a need to start a new repo? Would you be ok to merge yours with notepad++'s official repo[0] (both are in c++). Did this cross your mind before and what happened?

Not saying that they would allow but it'd help the community as a whole with less duplication of work and deliver more features.

https://github.com/notepad-plus-plus/notepad-plus-plus


I can't answer for the author, but keep in mind that Notepad++ is good because it uses the win32 API directly. I don't see any future where they'd just accept to replace everything with Qt.


There's no need to. It installs, updates and runs exactly as it did on Windows, I use it every day.


It's a complete re-implementation from scratch, they don't share code, using the same programming language is not particularly relevant.


Was wondering the same. Of course a lot of code is going to need to be new, moving away from Windows-specific APIs, but that's only a lower layer (or so it seems to me). Everything from the versatile search and replace (e.g. regex engine), the macros system, the syntax highlighting, the session management... all that code needn't have been rewritten, people clearly like it working the way it is


wow.. i know what you are talking. Is meta-cvs available somewhere?


Why yes; the Meta-CVS page lives on:

https://www.kylheku.com/~kaz/mcvs.html

Let me warn you in advance that Meta-CVS isn't just a client you can use over CVS repos to get all those nice things. It uses CVS repos to store its own format. All the files are given UUID-like hexadecimal names, stored as a flat directory. The mcvs program maps checks that out in a MCVS/ subdirectory. (So there is a MCVS/CVS directory under that where the ,v files are.) The MCVS/MAP file is a Lisp data structure which maps the UUID names to paths. The program builds and maintains the working tree according the MAP file.

Meta-CVS requires CLISP. Hopefully now just "apt install clisp" nowadays, and similar.

When I developed Meta-CVS as my first nontrivial Lisp project, I soon learned about people who will try to "strongly advise" you to write in Perl or Python:

"Another downside to Meta-CVS is that it is written in Common LISP. Granted, Common LISP is a very powerful and flexible language. However, it is not very common (pardon the pun) on UNIX and other systems, and few people could be expected to install a client if they need to install it first. If Meta-CVS' author wishes to make it more popular, I strongly advise him to re-implement it in C, Perl, Python or something more standard." http://better-scm.shlomifish.org/docs/nice_trys.html


insert meme *Don't give me hope*

In an idealist world, you're the perfect employer and I should be your employee. I want to believe that these employers exist but my conventional wisdom says, they are just textbook version, real world is different.

A little bit about me: In my 12 yoe (still at same place with steady growth), worked 6 years for one team, built a framework. one SE was dumping his new projects and moving to fresh ones while I get to maintain them, soon moved to different team. I stayed, maintained, stabilized, scaled, tuned, left only after then to different team (actually I'm the only one) to work on a new product. Again same situation, many have joined and left, few after 1 year and none of them contributed anything significant to the product. None of them cared about the product or the company, just optimizing for their career and pay. But then what's the passion in it if you don't care about what you've built and make it better.

I'm now pivoting to new team (don't know yet) planning to build something again from scratch. I'm optimizing for my passion, but some where deep inside I'm worried about my future prospects because of long haul.


We need more devs like you.


thanks. I want to see more devs like me. I call it - owners mindset.


there are two types of people. 1) who trusts everything on the internet 2) who doesn't trust everything on the internet.

whether its free or premium, the second group still wouldn't trust the search or ai results. they apply their own due diligence. sadly it makes a tiny percentage of the world people now. we should strive to increase this number and in next decade or so, mostly it happens.


I'm no more trusting the benchmarks. other than trying it out myself, what else can we do here?


It's already been done (ELO, see LMSYS rankings). I hope we're cresting past the 50% percentile mark of people who haven't heard of it.


I see. thanks for the reference. followed it on x now.

https://twitter.com/lmsysorg/status/1772759835714728217


GenAI novice here. what is training data made of how is it collected? I guess no one will share details on it, otherwise a good technical blog post with lots of insights!

>At Databricks, we believe that every enterprise should have the ability to control its data and its destiny in the emerging world of GenAI.

>The main process of building DBRX - including pretraining, post-training, evaluation, red-teaming, and refining - took place over the course of three months.


The most detailed answer to that I've seen is the original LLaMA paper, which described exactly what that model was trained on (including lots of scraped copyrighted data) https://arxiv.org/abs/2302.13971

Llama 2 was much more opaque about the training data, presumably because they were already being sued at that point (by Sarah Silverman!) over the training data that went into the first Llama!

A couple of things I've written about this:

- https://simonwillison.net/2023/Aug/27/wordcamp-llms/#how-the...

- https://simonwillison.net/2023/Apr/17/redpajama-data/


Wow, that paper was super useful. Thanks for sharing. Page 2 is where it shows the breakdown of all of the data sources, including % of dataset and the total disk sizes.


my question was specific to databricks model. If it followed llama or openai, they could add a line or two about it .. make the blog complete.


they have a technical report coming! knowing the team, they will do a great job disclosing as much as possible.


The training data is pretty much anything you can read on the internet plus books.

This is then cleaned up to remove nonsense, some technical files, and repeated files.

From this, they tend to weight some sources more - e.g. Wikipedia gets a pretty high weighting in the data mix. Overall these data mixes have multiple trillion token counts.

GPT-4 apparently trained on multiple epochs of the same data mix. So would assume this one did too as it’s a similar token count


https://arxiv.org/abs/2305.10429 found that people are overweighting Wikipedia and downweighting Wikipedia improves things across the board INCLUDING PREDICTING NEXT TOKEN ON WIKIPEDIA, which is frankly amazing.


Personally, I found looking at open source work to be much more instructive in learning about AI and how things like training data and such are done from the ground up. I suspect this is because training data is one of the bigger moats an AI company can have, as well as all the class action lawsuits surrounding training data.

One of the best open source datasets that are freely available is The Pile by EleutherAI [1]. It's a few years old now (~2020), but they did some really diligent work in putting together the dataset and documenting it. A more recent and even larger dataset would be the Falcon-RefinedWeb dataset [2].

[1]: https://arxiv.org/abs/2101.00027 [2]: https://arxiv.org/abs/2306.01116


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: