

Show HN: Wondering which SDKs are in iOS and Android apps & how they were made? - NateLawson

I started SourceDNA because I wanted to build a highly scalable, cross-platform binary similarity engine. We can dump in libraries and apps from all over and discern patterns in their code.  We&#x27;ve been scanning thousands of mobile apps and finding what&#x27;s inside, and we wanted to make this data available now for others to explore.<p>Clickable link: http:&#x2F;&#x2F;sourcedna.com&#x2F;stats&#x2F;<p>This interface lets you see which SDKs (ads, analytics, optimization, etc.) or cross-platform tools (Unity, Adobe AIR, Xamarin, etc.) were used to create the top 500 free apps on both iTunes and Google Play app stores. You can select an individual SDK vendor and the apps containing their code will be listed at the bottom. You can also click on an individual app to see what&#x27;s inside it.<p>I&#x27;d love to hear how you&#x27;d use something like this and if you have suggestions on how to improve it. If you&#x27;re interested in the technical details of how we managed to do all this, I&#x27;m happy to talk about them here.<p>Nate Lawson, Founder
======
NateLawson
Here's a link to it:
[http://sourcedna.com/stats/](http://sourcedna.com/stats/)

UI was built with Knockout, ChartJS, jQuery, Bootstrap. The upcoming version
will be D3.js since it's better for slicing data, and we've used it
successfully elsewhere.

Analysis backend is a varying combination of PostgreSQL, Celery, EMR,
ElasticSearch, and home-grown tools. You tend to have highly customized
pieces, such as our own disassembler, when tackling large-scale binary
analysis. Most of our engineers have a reverse-engineering background.

------
chimbycomm
Truly amazing. I don't know much about disassembling or binary analytics, but
I do find it fascinating that so much can be learned in an introspective
manner. Are developers generally ok with divulging their dev tools and which
SDK's they use? Is binary code really that transparent?

~~~
NateLawson
Thanks, glad you like it. We're putting this out there to figure out what
kinds of info developers would like about how tools are actually used by
others. Let us know via the Contact tab there if you have some questions or
think you have some problems data like this could solve.

Binary code is difficult to analyze. Typically, it's taken a skilled person
with a disassembler and knowledge of the software in use to figure out the
kinds of things we're showing here.

We've built a custom similarity engine that does this on a large scale. It
takes each binary apart into snippets, then looks up snippets that are close
to those representative features, and finally combines the results to return a
list of what code was found in that binary. Attach that to a firehose that is
delivering thousands of apps per day, and you've got SourceDNA. It was
extremely challenging to develop, and we're proud of what we've built.

We've found that most developers are interested in finding out what others are
doing, best practices, etc. I personally find it more enjoyable to apply
machine learning to track code instead of users, which is the more common
place big data is applied.

------
smirolo
That must be the most comprehensive study on binary executables I have seen so
far. Looks amazing!

