
Show HN: Plait.py – a fake data modeler - logv
https://github.com/plaitpy/plaitpy
======
Dowwie
I've found the _faker_ library [1] useful.

The fake data that I've bothered to model are weighted age ranges.
Fortunately, as of Python3.6, you can access it from random.choices [2] in the
stdlib

[1] [https://github.com/joke2k/faker](https://github.com/joke2k/faker)

[2]
[https://docs.python.org/3/library/random.html#random.choices](https://docs.python.org/3/library/random.html#random.choices)

example:
[https://gist.github.com/Dowwie/8409d871ddae913e44c61bc4d47ce...](https://gist.github.com/Dowwie/8409d871ddae913e44c61bc4d47ce1cc)

~~~
logv
part of what prompted the work on plait.py was that joke2k/faker was
reasonably slow to generate 10K fake names for me:
[https://paste.ubuntu.com/26354987/](https://paste.ubuntu.com/26354987/)

PS. that's a really cool python tip!

~~~
Dowwie
Well done, then! :) Is plait a drop-in replacement for faker?

~~~
logv
it's a drop in replacement for stympy/faker, but not joke2k faker

joke2k/faker is python and the data is stored in code (all or most of the
random values are in .py files around the codebase), perhaps leading to its
slowness.

stympy/faker is ruby and its random values are in yaml files, with some fields
defined as ruby functions (those are not supported by plait.py).

can use 'plait -l' and 'plait -ll name' and 'plait -ll name.name' (more info
in the README) to get a list of fake fields available.

~~~
Dowwie
if/when you tire with the latest performance improvement, consider porting to
Rust and adapting it to python via cffi

------
sfvisser
Another great approach to generating notional data is using the Haskell
QuickCheck library and specifically the Arbitrary type class. Super simple and
extremely flexible/composable.

Probably also available in other languages.

~~~
aldanor
The ones I used and can recommend:

“hypothesis” package in Python (+pytest plugin)

“rapidcheck” in C++

“quickcheck” in Rust

~~~
LrnByTeach
good list for generating model data / fake data

------
ironix
Interesting, though I must ask: Why YML instead of a DSL?

Granted, I come from Ruby, and writing DSLs is pretty typical. Maybe not so
popular in Python.

I am asking this because I become suspicious of config languages that read
like code. Is not a bonafide programming language the better choice in this
scenario? i.e. all overly-configurable formats (e.g. Terraform .tf files, JSON
schemas...) converge on just being a new scripting language?

~~~
logv
good point! i'm not against a DSL. as I was working on plait.py, one thought
going through my head was: "am i re-writing haskell or lisp but worse"? my
experience with python and DSL is that I need to use YACC / PLY to create a
grammar and so on. maybe a lot of work. i take it that its easier in ruby?

yaml was a format I chose because it is easy to write (close to human), but
can not express full programming concepts (but yes to some metaprogramming). i
did not want the templates to be full powered as they are meant to be able to
express relationships between variables, but not much more (especially not
side effects). they also support lazy evaluation - statements do not need to
be in order. this is closer to a "mathematical language" for me.

the choice for yaml was also based on the premise that if performance becomes
an issue, can hopefully move to another language but retain templates (will
have to re-implement python's "random" compat, though)

------
mikeokner
Cool, is there a way to use it to dynamically generate data (for streaming)?
Would be nice to be able to just call something like .next() and get another
record so a simulator can run for an indefinite period of time.

~~~
logv
if you create a template and keep calling .gen_record(), i think it will do
what you want. Template() does not implement python's __next__ or __iter__ at
the moment, but that's a good idea - i'm very open to diffs :-D

------
dgrant
Could I use this to generate XML?

~~~
logv
if you already have a way of printing XML, you can add a "printer" field (that
is a python function) to your template, like
so:[http://github.com/plaitpy/plaitpy/blob/master/templates/test...](http://github.com/plaitpy/plaitpy/blob/master/templates/testcase/codechef.yaml)

if that function uses an import, you might also need to add an "imports"
field, like in this example:
[https://github.com/plaitpy/plaitpy/blob/master/templates/web...](https://github.com/plaitpy/plaitpy/blob/master/templates/web/browser_with_geoip.yaml#L26)

otherwise, that's a feature that can be added here:
[https://github.com/plaitpy/plaitpy/blob/master/src/fields.py...](https://github.com/plaitpy/plaitpy/blob/master/src/fields.py#L1019),
if it works for you (and is added as a flag), i'd be happy to take patches.

------
bedros
What advantages over faker

------
dgrant
Could this be used to generate XML?

~~~
ship_it
Yes, it's open-source.

------
dalacv
a faker faker

