
Show HN: NLP-based tool for technology research - BLP4YC
https://www.researchly.app/analytics/dashboard
======
BLP4YC
To avoid any confusion: I did not come up with most of these features. They
are based on these papers (hopefully I have not forgotten any):

* Exploring technological opportunities by linking technology and products: Application of morphology analysis and text mining (Byungun Yoon a,⁎, Inchae Park a, Byoung-youl Coh b) * Technology opportunity discovery (TOD) from existing technologies and products: A function-based TOD framework (Janghyeok Yoon a, Hyunseok Park b, Wonchul Seo c, Jae-Min Lee d, Byoung-youl Coh d,⁎, Jonghwa Kim a,⁎⁎) * Investigating technology opportunities: the use of SAOx analysis (Kyuwoong Kim1 · Kyeongmin Park1 · Sungjoo Lee1) * Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling (Abdolreza Momeni a, Katja Rost b,⁎) * A New Product Growth for Model Consumer Durables (Frank M. Bass) * Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks (Janghyeok Yoon • Kwangsoo Kim) * Innovation hotspots in food waste treatment, biogas, and anaerobic digestion technology: A natural language processing approach (Djavan De Clercq a, Zongguo Wen a,⁎, Qingbin Song b) * TrendPerceptor: A property–function based technology intelligence system for identifying technology trends from patents (Janghyeok Yoon, Kwangsoo Kim)

~~~
Endlessly
When looking at a paper, how would you decide if it was of use to you?

~~~
BLP4YC
It is a combination of:

How interesting I find the paper's ideas.

Would what the paper proposes work with large data sets and can what the paper
proposes be (fully) automated.

Can I implement it (do I understand it enough, do I have access to the data)?

~~~
Endlessly
Curious, appears you used the same system for technical analysis of crypto
ISOs valuations; if true, what if anything did you learn that to you was the
most surprising as it relates to emergent tech know valuations?

~~~
BLP4YC
Yes, absolutely right. The core tech. between crypto-valuations and tech-
valuations stayed the same. It was basically: Python + spacy (plus a lot of
different supporting libraries).

Some surprising aspects:

For tech-analysis, you can accomplish a lot by using a rule-based parsing
because the source texts (e. g. patents) have the same sentence structure. In
fact, several researches have shown that patent text follow a certain text
structure (e. g. SAO, i. e. Subject-Action-Object).

For crypto, this was for more difficult as the text structures were all over
the place.

Also, crypto-analysis ("back then") was very messy because it was difficult to
find a trustworthy data set. With technologies you can confine it to
Wikipedia, patents, scientific papers. There is still a lot to analyze, but at
least you have a somewhat official data set.

Also with crypto you have far less data points per company/token/coin which
makes it hard for a machine to not disregard it as noise.

Similarly, with tech-evaluation it seems that - because you get so much data
from one document (e. g. one patent) you can often disregard a (big?) portion
of it and still end up with good results.

Additionally, it seems to me that crypto-analysis was supposed to be far more
numbers-heavy (how much funding etc.) and thus the tolerance for error was
relatively small. E. g. if you miss one funding (out of three) you can change
the company's valuation up to 50%. This happened to me basically all the time
which was super frustrating.

The last surprising fact was how difficult and complicated keyword extraction
is. For crypto evaluation I just went with relative word frequency (the more a
word appears in a text the more important it becomes, assuming it does not
appear in all the documents). However, as I have learnt with tech-evaluation,
there are maybe four of five strategies for keyword extraction. And this is
still an area where I have not found a solid solution for my NLP-case.

Finally, after all the reading that went into building researchly (which is
relatively little) I have realized that I know significantly less about NLP
than I have initially thought. It still fascinates me what kind
strategies/algorithms people come up with.

~~~
Endlessly
For keyword extractions, since you’re already using spaCy, you might take a
look at textacy, specifically:

textacy.ke.utils.most_discriminating_terms

Documentation is here:

[https://chartbeat-
labs.github.io/textacy/build/html/api_refe...](https://chartbeat-
labs.github.io/textacy/build/html/api_reference/information_extraction.html)

Code is based on this research:

King, Gary, Patrick Lam, and Margaret Roberts. “Computer-Assisted Keyword and
Document Set Discovery from Unstructured Text.” (2014).
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.1445&rep=rep1&type=pdf)

——

Thanks for the depth of your replies, love mapping information, finding
patterns, “seeing the future”... though more often than not, even if you knew
for sure some aspect of the future in a predictable way, still very hard to
make use of it, mainly enjoy the topic as more as a info geek more than
anything else.

Again, thanks for sharing!

------
BLP4YC
What it does: Uses NLP to extract technology-related information form patents,
Wikipedia etc. and then analyzes these technologies.

Technology-related information are: functions (what the technology can do) and
properties/components (of what is this technology made up)

------
xtiansimon
I've toyed with scripts from my NLP research, but nothing to this extent.
Kudos.

The webpage looks clean. The ratios of graphic elements to their scale is
pleasant. The two-tone color palette is pleasant. The line length in the main
screen (~140 c/l) is a bit much for my taste.

As far as the Try now, I'm not sure how these relate to each other. Maybe a
video demo walk-through taking one technology, such as the digital watch
(which we are all familiar) and walking it through the site?

As a "Try now" user, until I can throw in any random data and see it propagate
through your product, I won't realize what you've accomplished.

~~~
BLP4YC
Thanks! I have been experimenting with NLP for two/three years now and this
finally something somewhat useful. So your comment makes me really happay!

Thanks for your feedback regarding the main page. I also like the design, but
not because I made, but because the developers behind the templates did such a
great job: [https://startbootstrap.com/themes/sb-
admin-2/](https://startbootstrap.com/themes/sb-admin-2/) \+
[https://blackrockdigital.github.io/startbootstrap-landing-
pa...](https://blackrockdigital.github.io/startbootstrap-landing-page/)

I probably should give them credit more prominently (right now, I do it only
in the source code).

Regarding the "try now": I have never thought that this will be an issue. On
the contrary, I thought that different data across different features will
enable people to better understand the variety of the product. But what you
are saying makes absolute sense. I have added your points to my todo.

Again, thanks a lot for your feedback - it really helped.

~~~
Endlessly
Agree with the feedback on being able to take a raw inputs such as even a
manually selected set of documents such as wiki-pages, patents, research
papers, etc — and see both the high-level workflow and even step-by-step the
transformations and related code.

Clearly, that’s asking a lot both in terms, work and IP disclosure, but I am
guessing what the average user would want, that is they have a specific need
based on existing documents that the want ingested for analysis. Maybe I am
wrong, but agree when doing a quick look, aside from the raw data & code, what
I thought to look for to try and understand what was really there.

------
crsn
I’d like to chat more about this (including perhaps hiring you for a related
project). Mind emailing me? carson@carsonkahn.com

------
_____smurf_____
Nice work! what do you use for front-end and charts?

------
omeysalvi
Who is the target market? Creators and inventors?

~~~
BLP4YC
Please, tell me. Kidding, but also not. Honestly, I am still unsure. I started
working on it because I came across those features in different papers (if I
remember correctly - by accident) and thought they were cool. So I started
working on it.

But "creators and inventors" goes in the direction I was thinking.
Additionally, innovation consultants, tech. advisors, "policy makers" or
people who have to decide on technology but lack technical expertise in
certain fields, engineers...

~~~
Endlessly
If I had to guess legal due diligence (acquisition, filings, processing-
filings), competitive intelligence, large corporations with established R&D or
IP efforts, financial intelligence, etc — would all be markets worth looking
into.

Creators & inventors tend to want a lot of freebies in my experience.

