Hacker News new | comments | show | ask | jobs | submit login
Show HN: Plait.py – a fake data modeler (github.com)
85 points by logv 6 months ago | hide | past | web | favorite | 19 comments

I've found the faker library [1] useful.

The fake data that I've bothered to model are weighted age ranges. Fortunately, as of Python3.6, you can access it from random.choices [2] in the stdlib

[1] https://github.com/joke2k/faker

[2] https://docs.python.org/3/library/random.html#random.choices

example: https://gist.github.com/Dowwie/8409d871ddae913e44c61bc4d47ce...

part of what prompted the work on plait.py was that joke2k/faker was reasonably slow to generate 10K fake names for me: https://paste.ubuntu.com/26354987/

PS. that's a really cool python tip!

Well done, then! :) Is plait a drop-in replacement for faker?

it's a drop in replacement for stympy/faker, but not joke2k faker

joke2k/faker is python and the data is stored in code (all or most of the random values are in .py files around the codebase), perhaps leading to its slowness.

stympy/faker is ruby and its random values are in yaml files, with some fields defined as ruby functions (those are not supported by plait.py).

can use 'plait -l' and 'plait -ll name' and 'plait -ll name.name' (more info in the README) to get a list of fake fields available.

if/when you tire with the latest performance improvement, consider porting to Rust and adapting it to python via cffi

Another great approach to generating notional data is using the Haskell QuickCheck library and specifically the Arbitrary type class. Super simple and extremely flexible/composable.

Probably also available in other languages.

The ones I used and can recommend:

“hypothesis” package in Python (+pytest plugin)

“rapidcheck” in C++

“quickcheck” in Rust

good list for generating model data / fake data

testcheck is a JavaScript equivalent.

Interesting, though I must ask: Why YML instead of a DSL?

Granted, I come from Ruby, and writing DSLs is pretty typical. Maybe not so popular in Python.

I am asking this because I become suspicious of config languages that read like code. Is not a bonafide programming language the better choice in this scenario? i.e. all overly-configurable formats (e.g. Terraform .tf files, JSON schemas...) converge on just being a new scripting language?

good point! i'm not against a DSL. as I was working on plait.py, one thought going through my head was: "am i re-writing haskell or lisp but worse"? my experience with python and DSL is that I need to use YACC / PLY to create a grammar and so on. maybe a lot of work. i take it that its easier in ruby?

yaml was a format I chose because it is easy to write (close to human), but can not express full programming concepts (but yes to some metaprogramming). i did not want the templates to be full powered as they are meant to be able to express relationships between variables, but not much more (especially not side effects). they also support lazy evaluation - statements do not need to be in order. this is closer to a "mathematical language" for me.

the choice for yaml was also based on the premise that if performance becomes an issue, can hopefully move to another language but retain templates (will have to re-implement python's "random" compat, though)

Cool, is there a way to use it to dynamically generate data (for streaming)? Would be nice to be able to just call something like .next() and get another record so a simulator can run for an indefinite period of time.

if you create a template and keep calling .gen_record(), i think it will do what you want. Template() does not implement python's __next__ or __iter__ at the moment, but that's a good idea - i'm very open to diffs :-D

Could I use this to generate XML?

if you already have a way of printing XML, you can add a "printer" field (that is a python function) to your template, like so:http://github.com/plaitpy/plaitpy/blob/master/templates/test...

if that function uses an import, you might also need to add an "imports" field, like in this example: https://github.com/plaitpy/plaitpy/blob/master/templates/web...

otherwise, that's a feature that can be added here: https://github.com/plaitpy/plaitpy/blob/master/src/fields.py..., if it works for you (and is added as a flag), i'd be happy to take patches.

What advantages over faker

Could this be used to generate XML?

Yes, it's open-source.

a faker faker

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact