Hacker News new | past | comments | ask | show | jobs | submit login
Stack overflow knockoff for machine learning, NLP, AI, ... (metaoptimize.com)
152 points by finin on June 30, 2010 | hide | past | web | favorite | 30 comments

Very unfair to call it a knockoff, its waaaay more than that. The OSQA project is awesome, and rick & heranin (devs) are great guys, and always looking for more help. My ex-SE site will be using osqa, or is :) , and I will soon make the final migration soon. If you're local to SF, visit SF Answers! Good job bravura - debug

But, it is a knockoff. They've have copied the look and function of the Stack Overflow sites in detail - INCLUDING the bad design decisions, which is always a clear indication (not that we needed it) of a duplication.

I didn't mean "knockoff" to be disparaging, just that the design and functionality are very similar.

ok, thanks for the clarification :)

Why does it matter if its a knock off or not. If Stackoverflow can't get this type of deep dive into this community, why can't someone else try. Some times you just have to take things into your own hands. Its Joel and Jeffs fault for not opening their code to others like they said they would...

I fully support the decision to make this site especially if its not around yet.

Already the site has its first scoop!

Question: What little-known non-convex optimization trick has been used in most Berkeley NLP papers since 2006?

Answer: http://metaoptimize.com/qa/questions/14/what-are-the-state-o...

I am the person that built this site. I wasn't planning on announcing the site yet, until I disseminated it more widely in academic circles, because I wanted to establish a core highly technical user-base, but I guess this is fine. The quality of the users coming from HN has been great.

What people are saying about MetaOptimize Q+A:

Ryan McDonald (Google): "A tool like this will help disseminate and archive the tricks and best practices that are common in NLP/ML, but are rarely written about at length in papers."

Aria Haghighi (Berkeley): "Both NLP and ML have a lot of folk wisdom about what works and what doesn't. A site like this is crucial for facilitating the sharing and validation of this collective knowledge."

Bob Carpenter (Alias-I): "Par for the course, it’s a mix of wildly general (non-convex optimization) and reasonably specific (testing a random number generator) questions." (http://lingpipe-blog.com/2010/06/29/training-examples-a-stac...)

I'm targetting machine learning, natural language processing, vision, AI, statistics, data mining, neuroscience, etc. and other data-driven fields. As we've learned from StackOverflow, having a broad topic means that information cross-polinates between groups that don't normally communicate. This problem is particularly acute in academia.

It's a site for scientists to share knowledge and techniques, to document our ideas in an informal online setting, and to discuss details that don't always make it into publications.

Also, I've gotten a handful of job offers through answering questions on Quora. So hopefully this will connect people with gigs they like.

Why should you sign up and post a question or answer?

* Communicate with experts

* Crosspolinate information with experts in adjacent fields

* Answer a question once publicly, instead of potentially many times over email

* Share knowledge to create additional impact beyond conference or journal publication

* Find new collaborators

* Get job offers and gigs

The site is powered by OSQA. (http://osqa.net) I think it's unfair to the core developers to call it a StackOverflow knockoff, given that StackOverflow is---like most software---itself derivative.

Really thank you for that. I was trying to use quora for this purpose, but it's not specific enough. As a machine learning phd student from somewhere far from most good research centers (I'm in brazil, and how many brazillian ML papers have you seen in NIPS/ICML recently?), I struggle a lot with this folk wisdom. Most professors around here haven't really interacted enough with the international ML community to be up to date, and I often find myself recommending papers to my advisor and his peers. This can save me a lot of wasted time and effort; more than once I have spent a couple of months trying to solve a subproblem of an idea I had only to find out it is (a) trivial or (b) impossible with the current state of the art, and being able to find which of these is true in a couple of days does wonders.

I'm trying to disseminate this site to my peers and professors, to see if it will help people around.

How is is NOT a knock-off?

- The functionality of Q/A seems to be exactly the same - The visual design is almost indistinguishable from that of StackOverflow - The classification of questions (votes/answers/views with tags) is identical - The badges that users can earn is a blatant copy from SO - "First time here? Check out the FAQ!". Hmmm, where have I seen that before...? - etc.

There may be some examples where the derivative vs. knock-off classification is debatable, but here, for me, the answer is clear.

Please note that I'm not making a judgement on whether this is better or worse than SO, and I'm not making a judgement on the skills of the developers. Building something that clearly builds on someone else's work without any attribution that I could see, leaves a bad taste in my mouth.

Unless of course I don't know the whole story and SO ripped of someone else. I'm awaiting enlightenment...

Your email validation links appear to be broken -- I keep getting a 404 when I click mine.

Sorry about that. I will contact the OSQA developers upstream.

The software wasn't designed to work out of a subdir, so we're still ironing issues out on that front.

[edit: This happens for reasons that neither I nor the core devs understand: http://jira.osqa.net/browse/OSQA-204 ]

Yeah, there's an extra /qa/ in the URL. Removing that works.

Thanks a lot for that info. It worked for me too.

Are data dumps available?

Judging from this: http://news.ycombinator.com/item?id=1477725 - DuckDuckGo integration with StackOverflow is now live - I suspect the gentleman above ( http://www.gabrielweinberg.com/blog/ ) has something nice in mind ;-)

Awesome! Really excited about this.. Stackoverflow hasn't been that great of a place to ask ML/IR/NLP questions and expect good answers

Maybe this site will bring together all the ML people and do a better job.

Visually, the site is not that great. The logo is unreadable at first attempt

Visually, this site is too much like Stack Overflow, including the things that are not great about SO's design. Including: some fonts are too large, things that are text should be buttons, wrong amount of emphasis is placed upon certain information because the font sizes are wrong.

I liked the logo, especially the Idea behind it, I think a darker color or thicker lines in the circles might help though

Or change your feature extractors :)

This is being referred to as a knockoff, I assumed this was actually based on stack exchange. Can someone clarify?

The site is powered by OSQA (http://osqa.net), a Django/Python Q&A platform. The site is a fork of CNPROG, which was designed to mimic StackExchange.

OSQA is supported by DZone, and the pace of development has been rapid. The core developers have stated on many occasions that they are moving in independent direction from StackExchange.

In particular, because the site is open-source, I can experiment with adding NLP to it. I can improve the Related Questions, I can automatically infer tags, and I can implement techniques for helping you organize and navigate information.

Joel & Jeff are selectively rolling out new StackExchange sites.

The selection process for new sites is community-driven:


. . . but it looks like a stats site is close to being a reality:


There are several problems with that approach:

As Chris Manning (Stanford NLP professor) says, Area 51 hasn't gotten any buy-in from the academic community. I have focused on getting academia to be the immediate core of the community, so that the quality of Q+A is high. I am able to do this because of my academic connections.

They are fragmenting the Q+A sites into four: http://area51.stackexchange.com/proposals/33/statistical-ana... http://area51.stackexchange.com/proposals/6607/artificial-in... http://area51.stackexchange.com/proposals/2761/natural-langu... http://area51.stackexchange.com/proposals/7607/machine-learn...

The last thing we need is NLP and ML people communicating less. That's why my site encompasses all of these proposals, as well as adjacent fields. As we've learned from StackOverflow, having one site for a broad topic leads to cross-polination of ideas between groups who don't normally communicate.

Most importantly, OSQA (which powers my site) is an open platform, built on Django+Python. That comes with all the benefits of open software. In particular, because the site is open-source, I can experiment with adding NLP to it. I can improve the Related Questions, I can automatically infer tags, and I can implement techniques for helping you organize and navigate information.

"Related Questions": tokenize => random hash/project (tokens) => TD-IDF => KD-tree lookup

"automatically infer tag": tokenize / shingle q&a, ORDER token+bigrams BY TF-IDF(token + bigrams)

In both cases a global IDF estimate can be held in memory using a Counting Bloom Filter (or a traditional solr index).

and...you wont get shutdown :)

I tried to register and the verification link from/for the mail gives me a 404.

Edit: Easy to fix. Replace http://metaoptimize.com/qa/qa/account/validate/ by http://metaoptimize.com/qa/account/validate/

Tricky to fix, actually: http://jira.osqa.net/browse/OSQA-204

I am getting complaints that validation email links don't work (they have the subdir twice, as you mentioned), even though welcome email links work just fine.

This is weird because they both use the exact same link in the template:

forum/skins/default/templates/auth/welcome_email.html: <a style="{{ a_style }}}" href="{% fullurl auth_validate_email user=recipient.id,code=validation_code %}">{% trans "Validate my email address" %}</a>

forum/skins/default/templates/auth/mail_validation.html: <a style="{{ a_style }}}" href="{% fullurl auth_validate_email user=recipient.id,code=validation_code %}">{% trans "Validate my email address" %}</a>

Perhaps you could include instructions to remove the extra "qa/" in the mail you send? Ugly, but much better than having the validation link 404 in on the face of new users. I immediately got a bad impression of the site when I saw the 404.

I'm not getting any emails, including the welcome one. It's not showing up in my spam folder in Gmail either.

Jeff/Joel really buggered up. They did a bit of course direction and ended up even further off course. Oh well. The platform is important, but having the skills and personality required to build a bonza community are more important.

Do we have a reasonable measurement of an HN "weekend" now?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact