
Bistring – Bidirectionally Transformed Strings - varunagrawal
https://github.com/microsoft/bistring
======
zawerf
I was confused about the intended use case but there's more information in the
docs folder:
[https://github.com/microsoft/bistring/blob/master/docs/Intro...](https://github.com/microsoft/bistring/blob/master/docs/Introduction.rst)

Apparently it's for machine learning where you want to pick out a
span/substring in the original text but your model can only accept normalized
text (I am guessing for stuff like transforming out-of-vocabulary words into
UNK/unknown tokens). This solves that problem by keeping track of the index
mapping between the original text and transformed text.

(picking out spans is very common task in NLP, for example see the SQuAD
dataset: [https://rajpurkar.github.io/SQuAD-
explorer/explore/v2.0/dev/...](https://rajpurkar.github.io/SQuAD-
explorer/explore/v2.0/dev/Normans.html))

~~~
tavianator
Yeah that's the idea. I'm not sure exactly how to communicate the purpose of
the library in a few words. Possibly some animation that shows selecting spans
of text and highlighting the corresponding one in the original text.

The formatted documentation can be viewed here btw:
[https://bistring.readthedocs.io/en/latest/Introduction.html](https://bistring.readthedocs.io/en/latest/Introduction.html)

------
andrewflnr
Somewhat related: Boomerang
[https://www.seas.upenn.edu/~harmony/](https://www.seas.upenn.edu/~harmony/)
Discussed here at least once:
[https://news.ycombinator.com/item?id=565874](https://news.ycombinator.com/item?id=565874)

The title made me think of Boomerang, this looks like it has rather different
use cases in mind.

------
blt
This is interesting, but the readme doesn't say much about use cases. What is
a big application that could benefit from this?

~~~
microcolonel
Normalizing strings for comparison, but storing the original source version of
the substring that matched or was extracted from the normalized version,
useful for search boxes and such. Not sure what _their specific use case_ is,
but it's something I've implemented before, so I know it _has_ use cases.

