I use it with factory_boy (http://factoryboy.readthedocs.org/en/latest/) to generate test fixtures that seem to make logical sense. Usernames are real names, birthdays are real dates, etc.
I think it helps with experimentation when you're using the REPL and also makes bugs stand out a bit more easily. Very neat for demoing purposes too.
Perhaps my favourite is faker.bs() which always gets a giggle when doing live demos.
It's a standard gem in the community and it's used for generating various types of seed database data. Don't know if does the exact same things that the Python faker does.
The value of testing with real data is that it doesn't conform to your assumptions.
As far as I can tell, this benefit is impossible to fake with a system that generates fake data algorithmically. Generated data conforms to the assumptions of the system that generated it and therefore can only be used to test that a system conforms to those assumptions.
Fake data is still useful. Volume is often important (does your database slow down or crash when there are 10 billion records?). And if your fake data has very few assumptions, you can use that to reduce the assumptions made by the system you're testing.
Nevertheless, I'd really like to see a system like this which integrates data from some sort of general-purpose real dataset. Ideally it would be configurable so that people can document and choose a 99% use case they want to support (for example, a US company might want to support long names, but might not get a ton of value from supporting names with Chinese characters).
 The common use of fuzzers in a security context is to send malformed packets to protocol parsers to see if they fall over or cause buffer overruns, or otherwise do fun things in the context of exploiting a system. Another common one being automatic sql-injection discovery tools.
...from the Oulu University. It's more like a framework for generating intelligent fuzzers than a shrink-wrapped product, though.
The OUSPG guys are really good at fuzzing. There is also a commercial spin-off, Codenomicon, whose tools are quite widely used.
The command "dd if=/dev/urandom bs=1000 count=1" will spit out 1 KB of psuedorandom data you can pipe, POST or otherwise send to your application. (GNU's implementation lets you use "1K" as well.)
I can see how explicit execution of the startup code is a good thing, but can't help thinking how much better the experience would be if it just lazy-loaded the same code.
Am I missing something obvious that would prevent this? Bad magic?
In Andrew Ng's Machine Learning class he talked about taking labeled images and expanding the set by inverting, shearing, flipping, distorting them etc. He called the technique 'data synthesis'.
The test data problem has been hampering my team's ability to create maintainable automated tests.
I used to maintain one over 15 years ago.
At least City Street address post number and telephone had to be internally linked. Those are things that can be easily and automatically checked. So those constraints need to be checked also when generating data. It's also silly to give flat address on area where there aren't any flats etc. 30th floor on country side? Oh yeah. Distance based address downtown. As silly.
Another interesting library that is build on top of Faker is Alice. It allows you to define complex fixtures in .yml:
Something kind of similar and worth thinking about is this:
I've only done serious work with QuickCheck in Haskell but here's the python implementation I've played with:
The original Perl implementation is Data::Faker (https://metacpan.org/pod/Data::Faker). The earliest version available on CPAN appears to be from 2005.
testdata has a lot of unicode and file system stuff I've found really useful, it looks like between this and testdata I'll be in generated data heaven :)
The best places to learn are from the canonical libraries, quickcheck in Haskell, Quviq in Erlang, simple-check in Clojure, and there are others.
The challenge with all of these methods is that you want some notion of referential transparency in order to make useful properties. You can at least do that in certain contexts for certain expressions in Ruby and doing so will improve code readability.
I'd love to hear from others with experience using these techniques in Ruby or Python.
Testing at scale is important for performance and predicting bottlenecks as you grow. (i.e. Testing to break your systems capacity)
It can be difficult to generate good quality test data at scale, and data based on your specific schema.
This is how http://goodtestdata.com/ came about. It has the building blocks of core data and new sources can be built on request.
- I wanted fake CC numbers and SSNs/other national IDs at the time (don't remember why). I see that Faker is missing those, so they might be useful additions to the library.
- Method names should be snake_case rather than camelCase (http://www.python.org/dev/peps/pep-0008/#method-names-and-in...).
I don't recommend my version: it isn't maintained and isn't complete. However, writing a faker-type library is a great way to learn a language: you learn about how to organize code, how it handles different types, and how to package it up for use.
Very very useful in any build.