

Fighting spam with pure functions - Meist
https://www.facebook.com/notes/facebook-engineering/fighting-spam-with-pure-functions/10151254986618920

======
andrewcooke
it's nice, and i don't want to be that guy, but surely this screams out to be
implemented in lisp unless the type system (not mentioned here) is important
(and i'm not a lisp hacker!). or maybe they have separate teams developing and
using the language and purposefully want to expose a limited interface to the
users?

in short: why a new language rather than extending a language that loves to be
extended?

maybe they want to exclude mutable state totally? even then, wouldn't it be
easier to hide the relevant special forms?

~~~
lbrandy
Hi. Author here.

> and i don't want to be that guy,

Nah. Be that guy. They are all valid questions.

> unless the type system (not mentioned here)

I had bits in about the type system but it got cut cause, well, it was too
long. It has static types, (not many, the basic types required to parse json)
plus first order functions. There are no user defined types beyond ad-hoc via
maps/vectors. And it has "strong" typing and does type inference.

> but surely this screams out to be implemented in lisp

One of the reasons this developed into its own language is historical. It
started off very, very simple and slowly evolved into a domain-specific
language. But there are a set of design requirements that complicate the
situation: 1) easy to deploy changes and patch running instances (response
time is critical), 2) embedding in a service, interoperatibility with the
service, and so on.

> want to expose a limited interface to the users?

Yes, indeed. Part of the goal is to make the language as simple as possible
for analysts to use. Obviously you need to be technical to some degree to
write in a functional language, but we were trying to make it a business-logic
layer on top of the infra.

------
chime
I love the approach you took with halting I/O operations until they can be all
batched later. Could this be done in other languages, say JS? I know Python
has yield, but I wonder how one could batch all Ajax queries during pageload
of dashboards using this pattern.

------
gmac
Interesting, but would be nice to hear why they're _writing rules_ to catch
this, rather than (or on top of) taking the Bayesian route (as in pg's 'Plan
for Spam'[1]).

1\. <http://www.paulgraham.com/spam.html>

~~~
mkjones
Cow-orker of ldbrandy (and nbm!) here, and one of the clients of FXL. We
definitely make use of machine-learned models to catch some kinds of spam.
They're good at keeping old attacks at bay, and some are pretty good at
catching new attacks before they have a chance to evolve.

But any time we get good enough at catching an attack on a given channel, the
attackers will switch to a different one - often times one where we haven't
seen abuse before, and maybe don't have much good training data. In this case,
it takes time to gather the right data and train new models, whereas analysts
and engineers can do a reasonable job of stopping the attack in a faster
timeframe.

Though interesting, all of this is somewhat orthogonal to what this article is
about, though. FXL is the way we define features that are fed into
classifiers, and its engine does all the data-fetching necessary to get the
values of those features for classification. It also works for just putting
rules on top of or next to the ML.

------
mingpan
I'm also wondering the same thing as others. What specifically made it
worthwhile to make this a separate language, as opposed to a domain-specific
language implemented on top of an existing language with an existing
ecosystem? You could even enforce the usage of only a subset of the base
language if you really wanted.

------
friendly_chap
That's why Simon Marlow was hired by Facebook, to develop that language?

------
drobilla
Short circuiting AND isn't a very good example of the advantages of pure
functions. Most languages, including those heavily based on mutation (e.g. C
and its descendants), do this.

