
Field Linguist's Toolbox - Tomte
https://software.sil.org/toolbox/
======
mcswell
For many people, Toolbox has long since been replaced by SIL's Fieldworks
Language Explorer (FLEx). Toolbox allows you to get inconsistent
representations of lexemes; FLEx effectively prevents that. OTOH, if you're
doing s.t. other than the usual sort of dictionary, Toolbox might be a good
choice.

Disclaimer: I helped develop the morphology and phonology model in FLEx, and
an earlier version of one of the two morphological parsers in FLEx (Hermit
Crab), so I'm not an unbiased observer.

------
Rotten194
There's a really cool pipeline for shepherding Toolbox-cataloged utterances
and definitions into Delphin-project symbolic grammars to basically auto
generate morphological models & do TDD on symbolic grammars. Took a class from
Emily Bender where we built a grammar of Chintang with this system, and were
able to do simple machine translation via an intermediary language to other
grammars built in the class for e.g. Nuuchahnulth and other small languages.
Its a really cool system for machine understanding of languages that never
would have enough tagged text for a machine learning system.

~~~
yorwba
When you say "basically auto generate morphological models" how automatic is
the process exactly? I've been asked to help write a stemmer for Berber (which
I do not speak) for use on [https://tatoeba.org](https://tatoeba.org) and I'm
wondering whether this Toolbox and/or Delphin ( [http://www.delph-
in.net](http://www.delph-in.net) right?) would be useful for that.

~~~
mcswell
For morphology (esp. languages with complicated morphologies, like Berber),
the tool that most computational linguists reach for is finite state
transducers, particularly those that are built for use in morphology and
phonology. An early one of these was the Xerox xfst/ lexc program, which has
since been re-implemented in open source form as Foma
([https://fomafst.github.io/](https://fomafst.github.io/)). The book on
xfst/lexc,
[https://www.press.uchicago.edu/ucp/books/book/distributed/F/...](https://www.press.uchicago.edu/ucp/books/book/distributed/F/bo3613750.html),
is probably still the best place to go for a tutorial. Other FST programs that
have been used for morphology and phonology include the Stuttgart FST (sfst,
[https://www.ims.uni-
stuttgart.de/forschung/ressourcen/werkze...](https://www.ims.uni-
stuttgart.de/forschung/ressourcen/werkzeuge/sfst/)) and the Helsinki HFST
([http://hfst.github.io/](http://hfst.github.io/)). HFST allows the use of
weights, which can be useful for spell correction.

I've built built morph parsers with all of these except HFST, although that's
next on my list.

I'm not familiar with Delphin, but a quick glance at their website implies
that it's for syntax, not so much for morphology. They mention a Japanese
grammar implemented in Delphin, but it uses a separate tool for morphology.

In answer to your other question, the last time I looked, machine learning of
morph parsers (or stemmers, which are like morph parsers that throw away the
affixal information) is reasonably good for "fusional" morphologies, which
most modern IndoEuropean languages have. I don't think the state-of-the-art ML
would work well for Berber, because of its much more complex morphology.

~~~
Rotten194
The Delph-in system has a morphological model based on position classes.
Morphology items are treated similarly to syntax items, in that they can hold
constraints on & apply features to what they attach.

~~~
mcswell
Interesting, I hadn't seen that. Does it handle phonologically-conditioned
allomorphy or inflection classes? Stem allomorphy conditioned by phonology or
position in the paradigm (like the stem allomorphs of the Spanish verbs
'tener', 'crecer' etc.)?

------
nmstoker
Several aspects of this seem rather outdated from a software perspective (NB:
I'm not commenting on potential utility for language work). It's Windows, only
emulated for Mac/Linux, has several other hints that it's really old (who
touts DBMS like it's the 90s, even if that's what's behind something) and
crucially it's not open source.

