can you elaborate more this?
My last startup - Clarify.io - did automatic speech recognition through machine learning. Since I was doing the roadmap and API design, I wanted a better understanding of ML and started digging into it.
For context, I started by studying the Austin tech community since that's what I know and could manually check the conclusions.
Overall, the system tracks 5500+ groups, conferences, etc which is close to 60k events. It grows as it discovers new groups and events via a few channels.
As it finds and imports groups and then events, it categorizes each based on how they're described. If it's from Meetup.com, the category data and topic are reasonably good so it starts with that. If the source is Eventbrite, less so. If the source is an event website, even harder.
After ~15 months, it's discovered about 105k keywords/phrases, some only barely related. Of those, only about 7500 are actually useful. Not surprisingly, they include languages, frameworks, companies, products and combinations of words. (Side note: I've found words like "hacking" are less of an indicator now than when I started.. because everyone is "hacking" something.. marketing, cooking, etc.)
From all that, groups are qualified in/out based on their overall score. I manually review things that are borderline but that's 2-3 most weeks. That keyword/phrase list also feeds into the hashtags that get used.
The first version of this - hardcoded, no ML - was hacked together in a day. I've rebuilt it from the ground up to wire in the ML processing to scale across all the cities.
I later did some major refactoring to have pluggable output so it can broadcast into a Slack channel (done & released) and eventually send you a reminder DM or text (via Twilio!)
They have their place, just not in my system.