Hacker Newsnew | past | comments | ask | show | jobs | submit | spaky's commentslogin

The RNN does in fact directly predict emoji. It outputs a vector of length 1624 (the number of emoji) containing the score associated with each emoji given the input text. This vector of probabilities is what can be though of as the point in semantic space.

The issue of multiple meanings is that if you strongly predict an ambiguous emoji (say the prayer emoji) how do you then extrapolate what concept is contained in the sentence (e.g. was the person saying "thanks" or "high five" or "please").

[I'm also a Dango dev]


So yeah: we can focus on vectors at different levels of the net and these are in some sense different semantic spaces. In the article I talk about a level immediately before it projects onto the emoji vectors. If you look at the output after the projection (and do a softmax) you get a probability distribution across all emoji. This would be a different space in which each axis is an emoji, rather than the emoji being points distributed around the space.


Awesome, thanks for clarifying. So does the training optimize some property of the "semantic" layer immediately before the final emoji prediction layer? Or does it just optimize accuracy of emoji prediction directly?

And then the t-SNE projection shown in the article is based on this same layer (one before prediction)?


Well those are sort of equivalent. But yeah, we use cross-entropy between the projected output and the target emoji distribution as our objective to minimize.

And yes, we do the t-SNE on that pre-projection space. That's why we can visualize the targets (emoji) in it. We can also t-SNE the word embeddings themselves — the input to the RNN — which is also kind of interesting. It automatically learns all kinds of structures there. Chris Olah has a good post on word embeddings if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...


(I am also a dev for Dango)

This is definitely a concern and something we've though about but not yet fully solved. The neural net is trained on real world data which unfortunately includes various types of questionable, racist, sexist, etc content. We already blacklisted emoji combination that are too often triggered in racist ways. However such a system is very difficult to audit completely.

Your example comparing different skin tone modifiers is a good one that we hadn't thought of. I've made a note of it so we can try and improve.


Here's a more detailed article on in with some more images.

http://www.ship-technology.com/projects/flip-ship/

Specifically: "keeping the 700 long-ton mass steady and making it perfect for researching wave height, acoustic signals, water temperature and density, and for the collection of meteorological data."



Agreed. For me the issue here seems to be more around their handling of the issue. The new wording of their policy seems more sane/better to me. But they need to address the issue in an open manner ASAP.


At our company we use Slack [1] all the time for chatting and keeping in touch. It's basically just an IRC front end. (Slack is co-founded by Stewart Butterfield of Flickr)

For me the big advantage is that it "just works" (web interface + mobile app) for everyone in the company whereas there is overhead to getting IRC setup. We had tried IRC previously and it didn't take. It also doesn't hurt that Slack looks sexy.

I think that IRC has really stuck around because it somehow really captures asynchronous group discussion, but also simultaneously keeps the barrier to participation really low. It makes total sense to me that the IRC community is still so strong.

[1] https://slack.com/


Same here, the ui is definitely a huge selling point for slack. IRC just looks terrible, every client I have tried looks bad and connections are unreliable. Slack is a great solution for that.


You can also get tab and scrollback support in the regular mac terminal with mouseterm https://bitheap.org/mouseterm/


Minuum co-founder here:

Please check out our data collection policy at http://minuum.com/data/ . We actually wrote a fair amount of it ourselves to try and reduce the “legalese” aspect. Hopefully, as you suggest, we can work to increase transparence more in the future.

As someone below mention the key reason we introduced network access was to enable language packs.

We also collect some analytics, but it's limited to data which is inherently anonymous. We really want to build an amazing keyboard that people will want to use every day so it’s critical we be able to answer basic questions like “Is the keyboard installed?” and “Was it used today?”. Otherwise we couldn't validate our design decision.

As stated in our data collection policy, if we ever want to collect anonymized typing statistics or other data related to what you type, that would be strictly opt-in and we would be as transparent as possible about it.

And yes, as others mentioned, we only access contact information if you use the import contacts feature in the settings.


It's mentioned in the footnotes of the article and discussed below https://news.ycombinator.com/item?id=7111183

Basically the way Dvorak is optimized is sub-optimal for the Minuum disambiguation engine and so it reduces the accuracy. It turns out you're much better off with QWERTY or better yet something specifically optimized for Minuum.


I completely agree that as software makers, keyboard makers in particular, we have to be understand how prevalent and important multilingualism is. ASCII and the subsequent long road to Unicode is a fantastic example of this western centric bias.

The Minuum team includes a number of multilingual people; I mix french and english on a daily basis, others on the team speak slavic languages regularly. Similarly being based in Toronto, near the bilingual province of Quebec, our friends and family early user network included many multilingual people.

I think that more than almost any other type of software designers, keyboard makers are acutely aware of the importance and the myriad of issues surrounding multilingual support. We've known since very early on that language support is a major factor that will eventually make or break us.

For us, each new language actually brings a surprising number of exciting new technical challenges. Hopefully we'll be able to gain insights from the process and find those "few smart things" that can be done to get you from 5 keyboards to 1.


Funny you mentioned Québec when you're in Toronto: I wouldn't be surprised if there were more multilingual people in the GTA as there are in all of Québec.


Most Québecois speak both French and English, but I get what you're saying.

Toronto is so multicultural it doesn't make sense to claim proximity to Quebec as a language bonus.


I meant to suggest that between toronto's multiculturalism and Quebec's bilingualism we had a lot of early users who were multilingual. Sorry if that was unclear.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: