I love this and the previous two posts! This is extremely pertinent to me as I h...

visarga · on Dec 8, 2019

I tried weighted average, but in my case the weights were computed by doing dot product between all vectors in the phrase (m x m), then taking the average over rows for each word and normalising. Kind of like a poor man's Transformer. It will boost words that are supported by other similar words in the same phrase.

Der_Einzige · on Dec 8, 2019

Can you provide psudocode or code for this?

I'd post a snippet of my implementation but I'm too dumb to figure out the format.

But wow this is much better for results! Now to figure out how to make it fast!

visarga · on Dec 9, 2019

Something like this, including thresholding for pairs of words that have low dot product:

    import numpy as np

    def inner_product_rank(vecs, threshold=0.5):
        sims_mat = np.dot(vecs, vecs.transpose())
        sims_mat = sims_mat - threshold
        sims_mat[sims_mat<0] = 0
        ranks = np.sum(sims_mat, axis=0)
        return ranks

Then you can take a weighted sum of the vecs or use the ranks to select the most related words. It is also possible to run spectral clustering on sims_mat to get the main topics of the text, it works quite well.

axiosgunnar · on Dec 7, 2019

Hey - how can I contact you?

Der_Einzige · on Dec 7, 2019

gedboy2112@gmail.com

paulbrittain · on Dec 7, 2019

Maybe delete your email from this post when the guy contacts you