
Show HN: Stu – Build automation / make replacement for data mining projects - kunegis
https://github.com/kunegis/stu
======
brudgers
Related Blog with usage examples:
[https://networkscience.wordpress.com/2016/04/01/stu-build-
au...](https://networkscience.wordpress.com/2016/04/01/stu-build-automation-
for-data-mining-projects/)

------
kunegis
Hello everyone; the defining feature of Stu is that it allows to parametrize
rules and to execute a rule over a set of values for each parameter. This can
be achieved more or less easily with other Make-like tools, but Stu makes it
the central concept and therefore has a clean syntax for it. Stu is mainly
used in data mining projects where we often iterate over many datasets, many
variants of algorithms, etc.

~~~
rout39574
So, I'm torn, because there's really something to scratching your itch your
way.

But there's a saying: A year in the lab saved me hours in the library. It
applies to code, too.

Make has some profound science and experience crystallized in its approach to
the world. Anyone starting down their road with a new code base is making a
strong statement, whether they realize it or not.

I'm not sure I know what you mean by 'parameterized rules'. but if you mean
something as simple as 'A target to be built can imply more than one token to
be used in the rules to generate it' then I think you're reinventing wheels.

It's a lot of fun to build a green field that does just what you (think you)
want, and then take the first few steps to generalize it, and think "I've
built a new tool!". But somewhere in there you'll become an old fart, and
wonder why These Kids keep on re-inventing tools that have been written well,
and worn butter-smooth by the passage of many nerds.

Meh. Take a look at how you can calculate multiple tokens from a target
string,

[https://github.com/rout39574/make_an_example](https://github.com/rout39574/make_an_example)

it may not be your cup of tea.

old fart out.

~~~
kunegis
Hi; a parametrized rule in Stu would be something such as:

list.$NETWORK.$METHOD.$STYLE.txt: data/$NETWORK.txt compute_$METHOD.py
style_$STYLE.css { python compute_$METHOD.m --dataset data/$NETWORK.txt
--style style_$STYLE.css }

GNU Make supports individual parameters with '%', and Cook [1] supports
multiple parameters with '%1'/'%2'/etc.

The second defining feature of Stu are what we call 'dynamic dependencies',
which work like this:

@all: [dep];

>dep: NETWORKS.txt METHODS.txt STYLES.txt { for NETWORK in $(cat NETWORKS.txt)
; do for METHOD in $(cat METHODS.txt) ; do for STYLE in $(cat STYLES.txt) ; do
echo list.$NETWORK.$METHOD.$STYLE.txt done done done }

I.e., a filename in brackets is first built, and then Stu will parse its
content as a list of dependencies.

Myself I have used Make for years in the KONECT project [2], and implementing
these types of things is possible with Make, but only using some very verbose
and ugly hacks that become unreadable very fast. Also, I don't know of another
tool that has these features, and therefore I wrote one; the one that comes
closest is Cook by the late Peter Miller.

[1] [https://ftp.gnu.org/non-gnu/cook.README](https://ftp.gnu.org/non-
gnu/cook.README) [2] [http://konect.uni-koblenz.de/](http://konect.uni-
koblenz.de/)

------
rout39574
Make has done 'dynamic rules' for a long, long time.

[http://make.mad-scientist.net/papers/advanced-auto-
dependenc...](http://make.mad-scientist.net/papers/advanced-auto-dependency-
generation/)

------
nivertech
How is this different from drake?

drake - Data workflow tool, like a "Make for data" (in Clojure):
[https://github.com/Factual/drake](https://github.com/Factual/drake)

