
Show HN: AI driven online database of company profiles - jo_kruger
https://www.aihitdata.com
======
halo2foryoy
what component of this has anything to do with AI?

~~~
jo_kruger
It is like iceberg - website is just a tip... Website is interface to database
of company profiles. Each of these profiles was build from corresponding
corporate website.. all automatically. The process includes website crawling,
page categorization, language detection, entity extraction, structured data
composition, fuzzy comparison, etc... The key feature of the platform is 100%
automated process which allows repeat the whole cycle for every profile on a
regular base, which allows us to compare structured profiles and see all
changes..

------
nwsm
For those who can't RTF About Page

What is aiHitdata? aiHitdata is a massive, artificial intelligence/machine
learning, automated system that has been trained to build and update company
information from the web.

What makes the data different? aiHitdata not only extracts data. It can
monitor and understand the changes that occur on company websites; afterwards,
it records these changes as time series transactions. This information is
incredibly powerful. What are the benefits?

Most company information databases might tell you the name of a company’s CEO.
aiHitdata will tell you when the CEO changed and show the resulting
transaction, date and change details. Most query engines and company databases
simply tell you what a company currently says on its website. aiHitdata shows
you how the company has changed over time. Most company databases quickly
become out of date. aiHitdata is constantly being updated. This enables you to
understand what is happening to companies and to perform queries such as…

“show me all the engineering companies in California with a new CTO appointed
in the last 9 months.”

Hence, you can find and list companies by what they are and see what is
happening to them. This makes aiHitdata an extremely powerful tool.

Click here to see an example

What else is different about aiHitdata? It is up to date, and we mean really
up to date. There are no humans involved in compiling its data. It is all
fully automated. aiHitdata’s servers scour the Internet continually, 24/7,
monitoring and updating company data.

How does it work? This is complex, but to give an idea…

aiHitdata is continually building and refining its own URL map of the web Its
intelligent crawlers then extract the companies from this map (aiHitdata finds
approximately 30,000 new companies each week). It identifies, categorises and
extracts all of the Key Fields (see below) it finds on each company site It
then checks/ quality assures and stores them. Next, it re-crawls each company
site periodically, noting changes. When it spots a change, it records this as
a time series transaction in its database. The above has resulted in the
database that today:

has more than 15m companies (with the number of records growing each day); has
in excess of 500m historical company change events (transactions); is adding
more than 13m new company change events per month. How accurate is aiHitdata?
Part of aiHitdata’s AI/ML system is an automated QA management array. This is
one of the most complex parts of the system. aiHitdata monitors both
completeness and accuracy of its data. It is focused on keeping transaction
data quality at a very high level. This enables it to be used for predictive
analytics.

aiHitdata collects and monitors more than 100 different fields of data. For
key fields it is achieving and maintaining an accuracy rate in excess of 90%.

------
ajawee
How does it work?

~~~
jo_kruger
We have a platform running on more than 50 servers which includes crawlers,
website categorizers, entity extractors, etc. It scans about 30 millions
websites (categorized as corporate websites with content in English) on a
regular base. The goal is to build a structured profile for every known
company website.. and repeat this process every month in order to compare
historical profile and generate "transactions" like "company X changed CEO",
etc...

