

Detecting Algorithmically Generated Domain Names - wooster
http://nbviewer.ipython.org/github/ClickSecurity/data_hacking/blob/master/dga_detection/DGA_Domain_Detection.ipynb

======
llasram
If the goal is to actually detect DGA activity vs just playing around with
data, check out
[https://www.usenix.org/conference/usenixsecurity12/technical...](https://www.usenix.org/conference/usenixsecurity12/technical-
sessions/presentation/antonakakis) for a much more effective approach.

The key difference is the use of _sets_ of (NXDomain) domains vs single
domains. With a few additonal features, the boost in signal is sufficient to
allow classification as individual DGAs with essentially no false positives.

~~~
meowface
The project discussed in this paper requires having access to all of the
passive DNS data in an entire ISP's network, which isn't that practical for
many researchers.

OP's machine learning is arguably even more impressive, because it has a
decent success rate based entirely on open source data and the domain names
themselves, with no other corroborating information (like a NXDOMAIN
response).

~~~
llasram
You only need large quantities of pDNS data for the discovery portion. For
classification all you need are collections of domains produced by the same
algorithm ( which are readily available for the widely-known DGAs.) The
domains being NX isn't so much corroboration as fundament -- the NX-producing
search over multiple domains is the observable behavioral distinction between
AGD vs static C&C discovery.

(I've worked with both Antonakakis and Yadav, and implemented the production
version of Damballa's AGD classifier as per Antonakakis).

------
pestaa

        # I'm SURE there's a better way to store all the counts but not sure...
    

I'm seeing the progress with which this comment came to life.

