
Show HN: Orac – Filter news from social media using AI - jedwhite
https://app.orac.ai/
======
jedwhite
Developer here. Like a lot of programmers, I struggle with distraction, and I
get frustrated with time wasted on addictive social media. I also feel
frustrated by the lack of control I have over the algorithms that control what
I see on social media, and the constant tracking and privacy invasion.

So I've been working on Orac as a way to get interesting and useful content
that matters from social media, without losing focus. It's currently an early
preview version. The front end is built as a web app with React and GraphQL,
and it uses deep learning to rank quality and predict attributes about stories
it finds shared on social media (such as seriousness, objectivity, political
stance etc). The back end is AWS Kinesis for stream ingestion, Lamda for
running inference, and DynamoDB and ElasticSearch, along with AWS AppSync.

While it's experimental, the content predictions are already pretty
interesting, and it has a bunch of pre-built filters as a way to play around
with them - such as moods, personas and filter bubbles.

Would love to know what folks think and if this is something useful to people
that's worth pursuing!

~~~
nikolay
Is this branding related to Blake's 7?

~~~
Jaruzel
Probably not. The author seems american; The site has a very US news bias, and
only has US politics filtering.

Which is a shame. I hope the author plans to expand it to have a more global
viewpoint.

For those that don't know - 'Blake's 7' was a 1970s UK sci-fi TV show that had
a portable 'super-computer' called Orac, whose personality was arrogant and
aloof. The show was set in a dystopian future where a despised militarised
government rules the galaxy. Still worth watching even now to be honest.

~~~
jedwhite
I'd commented above that the current feed is intended for testing, but that
the plan is for you to add your own social feeds and news sources, and that it
will then skew similar in finding additional sources.

The "filter bubbles" are also intended as a couple of examples, so initially
I've used US ones.

I'm Australian and while I'm based in the US, one of the problems Orac is
trying to solve is localisation. Orac is running geo and locale predictions on
content, but they aren't very good so far, so I haven't included them in the
UI or filtering yet. But a few folks who I've shown have said that it would be
useful to filter content that's relevant by location, so that is definitely in
the roadmap!

------
maverick2007
So I've been thinking about this a little bit and I'm not sure if I'm the
outlier here but when I really want to focus on something, I'll turn off
social media completely. The medium is the distraction in itself, no matter
who I follow. With that being said, I think that this tech has a ton of
potential.

One use case off the top of my head would be for fantasy football. I'd love to
see some sort of mode where I can get high quality news for my chosen few
players or teams. I run into an issue right now where there's a ton of noise
around "is this player injured or not" and it'd be great to have some sort of
AI that could do a better job than I can in filtering what's fake or not.

Plenty of other interesting use cases for this tech, this is just one that
jumps out for me!

~~~
jedwhite
Thank you! Your feedback is similar to a lot of the feedback I've received
from friends I've shown it to, so I don't think you're an outlier at all.

I think most people do something similar to you, and block themselves from
social media. I know I do. People resort to deleting their accounts,
temporarily blocking access using browser extensions, or removing the apps
completely. The problem I've found is that there is interesting work-related
content that I do want to know about that gets shared through social media
(and even more messaging). And I get work messages through facebook and
twitter, and then half my day is blown. I think that's worse for people on a
maker's schedule. You need the important content (say new deep learning
research) but you blink and you've shot half a day after glancing at facebook.
The direction I'm heading with Orac is to use the topic modelling and a
doc2vec approach to match up clusters of concepts between content, and your
todo list. That's still a way off but this is intended to be a step in that
direction.

The deep learning models are still in progress, but they're already getting
pretty good at ranking content and identifying topics etc. One friend
described them as "scary" and I don't think they are quite there but they are
improving.

Adding "build your own" filters is definitely on the road map - so something
like the fantasy football tracking is a great use of that!

~~~
maverick2007
I didn't think about it that way but you're totally right! The reason I don't
open up social media right now is because I know there's going to be garbage
there that distracts me. If I could open Orac up and it would block anything
that it deemed as distracting then I don't see a reason why I wouldn't use it.

I really look forward to seeing what becomes of this!!

~~~
jedwhite
Hey thank you - yes that's a key driver. You don't want to have to throw the
baby out with the bathwater, so to speak :)

There is incredibly useful and inspiring content out there on social media. I
just don't want to have to trawl through a cesspool of crap to find the
nuggets of usefulness buried in it!

------
sjclemmy
Blake’s Seven? [0]

0: [http://blakes7.wikia.com/wiki/Orac](http://blakes7.wikia.com/wiki/Orac)

~~~
jedwhite
Well picked! :)

Another Terry Nation fan I'm guessing.

There's a feature being worked on that is some way down the track, but will
make sense of the name. If you'll forgive the teaser, but it's related to
having conversational control.

~~~
sjclemmy
I was a child when it was first broadcast and I loved it. It was the
inspiration for many a childhood game. We always used to argue about who was
going to be who - everyone wanted to be Blake, no-one wanted to be Servalan.

I think I recall trying to make Orac out of a shoe box and felt tip pens!

~~~
jedwhite
I've been planning to make an Orac from a perspex case using a Raspberry Pi
with some speakers and a camera (and a lot of pretend wiring). But haven't had
chance to get further than some rough plans :)

------
bradknowles
So, one angle I'm interested in is the surfacing of new sources of good
information. For example, many popular sites are basically just aggregators of
content from other sources, with perhaps a little light commentary on top.

Many years ago, I might have gone to Slashdot for the commentary from the site
members, but that devolved into a festering sewage pit a long time ago. But
slashdot does still sometimes link to good articles -- you just have to ignore
all the commentary on the site.

So, how do you keep discovering good sources of content and feeding them into
the system? If I wanted to feed a bunch of sites into the system and let you
do the work of filtering them for me, how would I go about that?

Another angle I'm interested in is the deduplication of content, and hoisting
the value of the earlier posts over the later ones that are just regurgitating
what some other site said? And related to that, how do you surface newer posts
that are actually an update to the older article, with new information?

Any thoughts or observations you can share?

~~~
jedwhite
I think you'll like some of the ideas we're working on.

For this preview release, you can't control the sources. But when it gets
released for real, you will be able to create an account an add your own
social accounts (twitter, facebook, linkedin, reddit) to monitor, and also add
your own news sources (RSS feeds, websites to spider etc).

Based on the sources you add, Orac will then try to find other interesting
similar sources for content to include in the feed.

On the back-end, we're basically building a database of media content with
quality and attribute predictions designed for searching and filtering based
on topic model matching.

There are some interesting ideas we're working on with de-duplication
(establishing canonical sources, effectively). But that's super early.

------
YC0mbi_Dave
Great stuff Jed. Have had a play with the latest preview and like it a lot.
Also starting to get a better sense of how you could apply Orac in an
organisational context and look forward to discussing next time we catch up.
Cheers, Dave

~~~
YC0mbi_Dave
PS: I can see a couple of use cases for it - e.g. teams working on projects or
tracking media coverage of specific issues. [Disclaimer - I know Jed]

------
tixocloud
Great concept. Would you be able to share how you would be able to filter
based on mood, personas, etc?

I can think of a few commercial use cases where this could be incredibly
useful but would likely based on the methodology you’ve used. Guess I’m
interested in how do you define quality and what attributes are you
predicting.

~~~
jedwhite
Thank you! Under the hood, there are about 80 different predictions or scoring
attributes being run. A lot of those are deep learning models.

A mood or a persona is really an algorithm representing a batch of those. A
lot of them you can get pretty close to just by using the Power Filter
directly. But there is some secret sauce.

But broadly speaking, you can think of a Hacker persona as being someone
interested in Science and Tech and Education as key broad topic areas, wanting
more serious content, and wanting content from credible science-based sources.
They prioritise useful content, with more in-depth coverage.

An Activist might be someone with a center-left to left stance, interested in
social fields, politics and psychology, but also influenced by trends and what
is currently generating a lot of "heat".

If you have representations of what good quality content "within that
community of personas" looks like, then you can train models and test scoring
systems that try to capture them.

Content quality is a big topic. But our working thesis is that there are
canonical examples with domains of expertise of what great content looks like
- PG's essays, Pulitzer Prize Winning Journalism, award winning research
papers. And there are similar examples of what very bad content looks like -
hate speech, extremism etc.

So that's been the broad basis for the models that underly the predictions.

[edits for typos]

~~~
vinni2
Just wondering if you could provide more details on your model like do you use
a CNN or an LSTM? Is there a paper you have written about it? Also how do you
get the data for training your models? Also do you do any retraining as new
data comes in? Do you also have some groundtruth to measure the performance of
your models?

~~~
jdh30
I just noticed your post
([https://news.ycombinator.com/item?id=18050422](https://news.ycombinator.com/item?id=18050422))
about courses but cannot comment on it because it is too old. I am an employer
with strong opinions. If you'd like to hear them please e-mail me here:
[https://www.idtechex.com/contact/team/dr_jon_harrop.asp](https://www.idtechex.com/contact/team/dr_jon_harrop.asp)

------
darrenwestall
How can I get in touch with you please? I’d love to plug this into what I’m
doing, there is a lot of synergy.

~~~
jedwhite
Hey Darren, very happy to chat and find out more about what you're doing.
Anyone is welcome to email me about Orac with ideas, feedback etc too - jed
[at] orac [dot] ai

------
abrichr
In Power Filter mode, it seems the following knobs are available:

* Overall rating

* Political Stance

* Political Balance

* Seriousness

* Reading Level

* Objective Tone

* Credibility

* Virality and Engagement

* Temperament

How do you measure these objectively?

~~~
jedwhite
This is a great question, but it's hard to address in a comment. However, I'll
have a go (in brief) :)

There are biases in all algorithms because they represent an encoding of
either judgement or bias in training data selection.

So our aim here is to control for bias, and take some steps towards giving the
user more control, compared to an approach like Facebook's where the algorithm
is entirely a black box.

The approach we've taken is, effectively, to try to codify editorial judgement
and professional journalistic best practices into the system, and the
selection of training data. As well as being a programmer / computer
scientist, I'm also a professionally trained journalist and was editor of
Australia's leading computer magazine.

The knobs aren't direct representations of model predictions themselves, but
weighted summary scores based on algorithms using a number of lower-level
predictions (using statistical models, ML/DL models, ensembles). The
personas/moods/filter bubbles use more of the individual attribute
predictions.

Having said that, in practice it's all a bit of an experimental mish-mash
currently, and we have a lot of work still to do. Some predictions are way
more effective than others, as you can see browsing through it. And others
(like source and author quality and attribute prediction) are learning and
improving over time. But we have a lot of iteration and experimentation ahead
of us!

In practice, the initial feedback has been that the predictions are
surprisingly good. But sometimes they are way off the mark.

------
21stio
Hey Jed, cool stuff! What's your vision for the product?

~~~
jedwhite
There are a couple of ideas driving it, but the main one is to use AI (deep
learning sequence to sequence models especially) to filter and recommend
content to help busy people stay focused on their work.

The background is that social media is making us dumber and less productive.

So I want to make something that lets you focus on your work by finding the
best content matching what you're working on or interested in, and presenting
it in a distraction-free way.

~~~
21stio
I totally agree.

How do you want to enable the user to use that functionality?

Is it only intended to apply the classification to content that's available
throw orac.ai?

What are your thoughts on delivering it as a browser extension and/or api?

