Hacker News new | comments | show | ask | jobs | submit login
Show HN: A tool that transforms your whole list with just one example (transformy.io)
764 points by createmyaccount on Apr 24, 2015 | hide | past | web | favorite | 144 comments



Amazing tool.

Feature requests: allow for more than one example.

    Input:
    {"class": "101", "students": 101}
    {"class": "201", "students": 80}
    {"class": "202", "students": 50}
    {"class": "301", "students": 120}

    Example:
    Class 101 has 101 students

    Output:
    Class 101 has 101 students
    Class 201 has 201 students
    Class 202 has 202 students
    Class 301 has 301 students
Right now the first line cannot have any ambiguity. This is fixable by reordering, but with large enough data sets I may have some ambiguity in all lines, at different places. Multiple examples would fix that.

Again, loved the tool. I can see this going very far, specially with non-technical people.


For that use case you can use the Lapis[1][2] desktop app (the secret weapon I use for data munging), which allows you to choose several examples and edit a file with direct manipulation - or define patterns using a DSL.

[1]https://en.wikipedia.org/wiki/Lapis_(text_editor)

[2]http://groups.csail.mit.edu/uid/lapis/


I love how that editor "just works" 12 years after its last release. Thanks!


It looks like Java 1.1 UI from 1996.

But thanks for the suggestion, it might be useful.


You could just add a "dummy" input as the first line with all unique entries. Then just remove it from the output. Am I missing something? It's a tiny bit of additional manual processing, but doesn't seem unreasonable.


Glad to hear you like it! Thanks for posting your use case as well, helps us out a lot!


Class 201 has 80 students


Right, but if you have a thousand items, with a dozen fields each, you can't be sure the one example you've picked will resolve all ambiguities. But if you could supply four or five example lines, the chances of ambiguity drop off.


As mentioned by others, this is implemented in Excel 2013 as Flash Fill feature.

https://www.youtube.com/watch?v=UccfqwwOCoY


I just attended a talk but Sumit Gulwani where he demo'd FlashFill, FlashExtract and walked through their underlying architecture. His talk had much cooler demos than any I could find online but from what I can tell it seems like they provide a superset of transformy's functionality (not to detract from it, this is very cool, and I'd be curious to see how related the underlying theories are). Apparently current work by that team at Microsoft is focused on abstracting out functionality into a system called FlashMeta so that it can be applied to a bunch of domain specific problems. Overall very exciting work, from both parties.


Do you know if this is available on the new Excel for Mac too?


No. The shortcut does not work, nor is the menu entry in the "Fill" menu.

It's one of many missing features.


Nice, Excel really is one of the most undervalued pieces of software, it has so many details like this and it handles almost anything you throw at it. Pivot table is another of the lesser known features that is hard to live without once you've found out that it exists.


Given that Excel runs many, if not most financial organizations at a practical level, I'm not sure that it's undervalued in practice.

Excel is to finance people as terminals are to developers. When getting bug reports from end users, we've even had them delivered to us as screenshots embedded in Excel files.


Afaik, "LeBron" is his first name. The way you use it (and misspell it) is confusing, I'd suggest you replace him with a simpler alternative (Kobe Bryant? Tim Duncan? whatever).

Same for Kagawa -- the Japanese use surnames in a different way, might be simpler to replace him.


I don't follow basketball and really thought LeBron was his family name. Thanks!


This area of research is called "program sketching", which creates programs by example.

http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-17...

edit: see below


It's more accurate to call it "programming by example", which is a much older term, in common use, and actually means "creating programs by example".

http://en.wikipedia.org/wiki/Programming_by_example

Sketching means to provide a partial specification, of which details are filled in by the system. But in this case, the user is providing a full description of a single concrete example. These are different concepts.


Thanks for the clarification, they are similar but not the same. It looks like _sketching_ is more analogous to using Hindley-Milner for filling in gaps in an executable spec, where as Programming by Example infers code from data (examples). There is a wealth of interesting material referenced in http://web.media.mit.edu/~lieber/Your-Wish-Intro.html


Sketching is a superset of programming by example. In examples, you only get input/output pairs, which as this thread shows, infers frustratingly close but wrong programs.

In sketching, the input and output can also includes partial programs (sketches), so you can mark what you like, tweak, etc, and it fills in the rest.

Increasing usability, the fragments can be in different languages. For example, input can be simple C and some test data, and output can be CUDA GPU code or pthread mutex locking schemes.

For Graphistry.com, we believe in these techniques for ETL and sketching visualizations.

(Also, sketching normally uses machine learning or SAT/SMT solvers: types are more typically used for input hints.)


Cool. I have been mulling what it would be like to have a semigraphical tool to generate SQL queries (Spark,Hive,VoltDB,etc) where the user would drive a keyboard applying operators over columns, rows and relations for mapping, filtering, joining. Like VIM plus ZPL and the goal is a statistical result over some range. Why can't prolog programs generate our programs? One of my query visualizations is an origami like structure where data is joined across a relation, something like an Explain Plan on Hollywoolsd.

By analogous, I meant more generally, using some partial knowledge about a program or spec to fill in missing pieces, not types specifically.

Nice to know folks are using these techniques to do real world tasks, I always thought something like this would be first used for cleaning data. Types, properties, sketches and examples.

Three levels of relatedness, https://vimeo.com/22606387


Indeed -- check out my strangeloop talk on places we did this :) http://www.infoq.com/presentations/dsl-visualization


On your main screen, make the example editable. It would be nice to be able to just enter into the green box to see how it works rather than have to click through the "Get Started"

Also, your instructions makes it seem like the example is editable:

SUPER EASY TO USE 1. Paste your source data in the white box on the left.

2. Type in the green box on the right how you would like the first line of your data to look.

3. Transformy will look at your example and transform every line from your source data into the same format.


You're right that our instructions make it seem like the examples are editable. We'll work on that. Thanks for your feedback!


+1


This is a great idea! It didn't behave the way I expected with some URLs as input, though:

    http://example1.org/path/index.html
    http://www.example2.org/path/index.html
    http://www.example3.org/
    https://www.example4.org/a/b/c/d/e/f/g/hijklmnop
The pattern I gave was:

    example1.org, path/index.html
So I expected to get:

    example1.org, path/index.html
    www.example2.org, path/index.html
    www.example3.org, 
    www.example4.org, a/b/c/d/e/f/g/hijklmnop
Instead, I got:

    example1.org, path/index.html
    wwwexample.2, org/path.index
    wwwexample.3, org/.
    wwwexample.4, org/a.b
A few feature requests: allow downloading the output as a text file; show a pseudo-code formula of how transformy interpreted the transformation, like "s/.+:\/\/(.+?)\/(.*)/\1, \2/"; and add support for common arbitrary transformations like "November"↔"NOV"↔"11", or "2"↔"2nd".


I think it's trying to be too magical. At this point it either seems to work, or something triggers it's pattern matching wrong and it's really hard to figure out what or why. I think giving back a little of the simpleness in favor of more control is worthwhile. For example, if the example portions that were formatting were differentiated from the data matching, it's not too complicated but intent is much clearer.

For example, if the rules were: example content must be contained within braces, and any braces within the example content need to be escaped, it's clear. At that point, your example becomes:

  {example1.org}, {path/index.html}
It would still probably just return "wwwexample.4, g/hijklmnop" for the last example though, because it's ambiguous as to whether you want just the end of the url, or the whole thing. Allowing regex markup for more explicit matching would make it clearer still, but your example still causes problems until you go all the way to positive lookbehind assertions. At that point I need to learn all that, I might as well just use perl:

  # perl -pe 's{.*https?://([^/]+)(/\S*).*}{$1, $2}' /tmp/foo
  example1.org, /path/index.html
  www.example2.org, /path/index.html
  www.example3.org, /
  www.example4.org, /a/b/c/d/e/f/g/hijklmnop


TXR language:

   @(repeat)
   @proto://@domain/@path
   @(do (put-line `@domain, @path`))
   @(end)


Amongst other things this can be used for cleaning tables/lists from special characters, changing date formats and creating xml or json.

Feedback and suggestions are very much welcome! We plan on adding a few more features soon as right now it is fairly basic but would like to hear some opinions and see if there's people out there that have a use for this.


This is really neat! Any chance this will be a cli tool or module/library?

It doesn't seem to play will with something like this as an input:

    3, Roberto/Carlos, soccer, Brazil
    35, Roberto/Carlos Michael Jordan, baseball, USA
    6, Roberto/Carlos James Lebron, basketball, USA
    10, Roberto/Carlos Shinji Kagawa, soccer, Japan
Format:

    3, ROBERTO/CARLOS, soccer, Brazil
Gives me:

    3, ROBERTO/CARLOS, soccer, Brazil
    35, ROBERTO/CARLOS, Michael, Jordan
    6, ROBERTO/CARLOS, James, Lebron
    10, ROBERTO/CARLOS, Shinji, Kagawa
I can't seem to find a way to get it to parse that out properly (playing with the ROBERTO/CARLOS part.

I even tried this as an input:

    3, Roberto Carlos, soccer, Brazil
    35, Roberto Carlos Michael Jordan, baseball, USA
    6, Roberto Carlos James Lebron, basketball, USA
    10, Roberto Carlos Shinji Kagawa, soccer, Japan
Format:

    3, ROBERTO CARLOS, soccer, Brazil
Gives me:

    3, ROBERTO CARLOS, soccer, Brazil
    35, ROBERTO CARLOS, Michael, Jordan
    6, ROBERTO CARLOS, James, Lebron
    10, ROBERTO CARLOS, Shinji, Kagawa
Edit: format


My brain doesn't understand what you're trying to do either. Why is Roberto/Carlos on every line?


I formatted their examples to appear like some real data I have that appears like this, obviously not names but descriptions of some projects. I was curious how this would handle it.

In any case, get rid of the "/" and its closer to real. Some people have more than two names in their full name. And on a set a little larger you could very well have something close to my second example.


Currently it matches word by word, so for example if someone has a family name in two parts like "Van Buyten", it wont work. I think it's the same problem in your example: that the first "column" contains multiple words in some cases? We'll be fixing this in a future release!


I thought that a cli version might be useful too. The closest thing I have right now is sed/awk. Sed can do this kind of stuff but you have to specify a Regular Expression instead of a simple example. Because you have to be more specific about what you want, Sed will definitely handle those examples, with the caveat that you have to tell Sed what it is that you want to substitute and where for each line.

http://linux.die.net/abs-guide/x19673.html

It took me about a year of use before I could figure out how to munge lines in it, so it's definitely not for the faint of heart. I use it for things like transforming excel spreadsheets into C struct arrays.


I was curious and sketched up something similar to this website in about a 100 lines of Python code. It has a CLI interface, have a look if you're interested:

https://gist.github.com/martinthenext/fc989ffa6ec84ee09962


Why would it work? Your data isn't even isomorphic.

On the first line, you have "Roberto Carlos" followed immediately by a comma. On subsequent lines you have Roberto Carlos followed by two other names.

Also your example works fine if you use a different delimiter for your format vs for your input, e.g.

  3 | ROBERTO CARLOS | SOCCER | Brazil



Given this "35, Roberto/Carlos Michael Jordan, baseball, USA" tuple what are you expecting as output?


I was expecting that as the output but got: 35, ROBERTO/CARLOS, Michael, Jordan

See other response as it works on word by word. Here is (hopefully) a better example:

Input:

    35, Billy Jean, soccer, USA
    29, Billy Jo Jean, football, USA
Transform-at:

    35, soccer, Billy Jean, USA
You'll get:

    35, soccer, Billy Jean, USA
    29, Jean, Billy Jo, football
But I was expecting:

    35, soccer, Billy Jean, USA
    29, football, Billy Jo Jean, USA


Right, understand now, it's using words as atoms rather than breaking fields at the commas and using them.


Does it work by a general library for learning patterns, or as a collection of heuristics to match common cases?

That difference would change the kind of examples I would try it with, knowing in advance when they're too complex to work.


Currently there's no real machine learning or advanced pattern matching involved. But we're certainly working towards that!


Good concept but doesn't work. Example, type different variations of legal well formatted addresses.

  1 Microsoft Way Apt 43, Redmond, WA 98065, U.S.A.
  1-1/4 Palm Hwy, Colino, MA 87009, USA
  500 Potasium Cloride, Sunshite-Big Blow City, PA 30000, United States of America
First line output should look like:

  1 Redmond 98065 U.S.A.
Also having country-specific obscure sports terminology on landing page example can cause lot of confusion.


Figuring out a format for addresses is actually really hard.

https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...


Right, marking areas (in this case terms between commas), like Google's Webmaster Tools structured information designer does it, would help. This would require some kind of regular expressions in the example, at least. Of course this would make matters more complicated. This tool excels at simple data sets, I don't think it is meant to be universal.


We intend to support this in future versions but haven't gotten to it yet. It makes things a little more complicated.


Buggy, all I did is add Dean middle name to third line

  In:
  3, Roberto Carlos, soccer, Brazil
  35, Michael Jordan, baseball, USA
  6, James Dean Lebron, basketball, USA
  10, Shinji Kagawa, soccer, Japan

  Ex: Carlos is number 3 playing soccer

  Out:
  Carlos is number 3 playing soccer
  Jordan is number 35 playing baseball
  Dean is number 6 playing Lebron (what??)
  Kagawa is number 10 playing soccer
I guess you can't really solve the ambiguity of Carlos meaning the second word on the second column versus the last word of the second column; but the commas should at least hint a tabular pattern, no?


I hope you sell your IP to Microsoft Excel. This would be a major time saver feature for a lot of the Excel world.


This is already in Excel as FlashFill, so don't know if you're being sarcastic. http://research.microsoft.com/en-us/um/people/sumitg/


Not sarcastic... I didn't know about FlashFill. I looked at the videos, and it seems that it's not as powerful as this tool. In excel the results would be a custom cell format combined with custom text functions.



Not sure about the implementation in Excel. At least the paper (POPL 2011) describes more powerful functions, containing e.g. case distinctions and simple loops. There are some examples in there. I'm not an Excel expert, but would be surprised if cell format could do that.


I could have used this hundreds of time in Visual Studio.

We often process messages with hundreds of fields. So I will have a class from specs (usually excel) with a number of properties, then I need a method to populate each of those from some other object, then I need a method that does the reverse - populates some other object from the properties.

This happens over and over for us. Typically I just use a Notepad++ macro. But I could probably use this as it stands, just having it in visual studio would be really incredible.


Install a vim emulator inside of Visual Studio, then use macros.


For this crowd, I think the 3rd slide with the JSON transformation should be the first slide.


For some reason it won't handle certain characters...

    Input:
    Feature: 37, threshold: 4386, +
    Feature: 11, threshold: 1, +
    Feature: 10, threshold: 13, +
    Feature: 0, threshold: 34, +
    Feature: 39, threshold: 44, +

    Example:
    x[37] >= 4386

    Output:
    x[37]  4386
    x[11]  1
    x[10]  13
    x[0]  34
    x[39]  44


Thanks for your feedback!

We'll fix this soon.


You know, for people that don't know Awk or aren't comfortable with a scripting language, this is a really nice idea. Thinking back to grad school, which was in a non-computer science quantitative field, there are lots of people that would have appreciated having something like this easily available.


It's useful for those people too; it saves you the time/mental load of writing (and potentially debugging) a regex.


I love the product.

I have a small piece of feedback on the site. You could make it a tiny bit clearer that this is a free, registration free, service which people can start using with just one click.

When I first visited the site, I looked it over, noticed the email box and the "get started" and just assumed it was a library I'd need to buy. It wasn't until I came back to the comments here that I realised the site was a service (which is actually extremely useful to me, and it has been bookmarked).

Why not just make http://www.transformy.io/#/app your homepage instead?


That was originally the plan but we showed it to some non technical people and they only understood the idea once we showed some examples.

But you are right that we should make it more clear that it's free. Thanks!


On this same subject, I would just make the boxes on the very first page editable so you can just play with it right away. After reading the description at the bottom I spent a few seconds trying to manipulate the text boxes on the front page before I realized I had to click a button. It would be really cool if they were live editable examples right on the front page.


This is a very cool tool. I wouldn't trust it with any sensitive info though. The lack of terms, https, and the fact that it's closed source means I have no idea of what could happen to the data I put in there.


If anyone's interested here's my own rather less sophisticated effort for these kind of odd jobs: http://whalemerge.com/


If you're translating into text that's meant to be readable, it seems like you need to add a few items to your dataset that give additional information on natural language.

For example, I added some information in the example below about which pronoun to use based on gender. Would be really neat to have this sort of information built into the tool.

Input:

    {name: "James", age:"30", hobby: "running", genderWord: "his"}

    {name: "Erin", age:"28, hobby: "cooking", genderWord: "her"}

    {name: "Owen", age:"3", hobby: "playing chase", genderWord: "his"}

    {name: "Luke", age:"1", hobby: "reading", genderWord: "his"}
Example: James is 30 years old and his favorite hobby is running

Output:

    James is 30 years old and his favorite hobby is running

    Erin is 28 years old and her favorite hobby is cooking

    Owen is 3 years old and his favorite hobby is playing chase

    Luke is 1 years old and his favorite hobby is reading


This works rather poorly.

  Input:
  {"message":"hello there","id":1}
  {"message":"why hello there","id":2}

  Example:
  {"id":1,"message":"hello there"}

  Output:
  {"id":1,"message":"hello there"}
  {"there":id,"message":"why hello"}


Nice tool and concept dude.

But it doesn't seems to look properly at the meaning of content though.

I mean, I think it just finds the first presence of what the given pattern is and generates the result.

For example, I attempted this, with input data as date/time.

input data:

  2015-09-15T09:15:17-05:00
  1998-11-05T08:15:21-05:00
  1999-01-03T04:33:30-05:00
  2000-11-05T09:16:00-05:00
pattern:

  09:15 on 09-15
result:

  09:15 on 09-15
  11:05 on 11-05
  01:03 on 01-03
  11:05 on 11-05
What I was expecting:

  09:15 on 09-15
  08:15 on 11-05
  04:33 on 01-03
  09:16 on 11-05
Although I helped it at certain level by using the same special character pattern the input data has.

And as I was afraid, it doesn't handle special characters, neither uses them in the process either, as @j2kun has mentioned in a comment.

But, it is promising at some point and have nice use cases too. :)


With the TXR language:

  @(collect)
  @year-@month-@{day}T@hh:@mm:@ss-@tzh:@tzm
  @(end)
  @(output)
  @  (repeat)
  @hh:@mm on @month-@day
  @  (end)
  @(end)
Or:

  @(repeat)
  @year-@month-@{day}T@hh:@mm:@ss-@tzh:@tzm
  @(do
     (put-line `@hh:@mm on @month-@day`))
  @(end)
On the command line:

  $ txr -c '@(repeat)
  blah
  ...
  @(end)' - # dash for stdin or file name
From a file:

  $ txr script.txr file
I see we have a mistake in the handling of time zones; the minus sign is part of the time zone offset. Perhaps a small dash of regex, maybe:

  @year-@month-@{day}T@hh:@mm:@ss@{tzh /[+-]\d\d/}:@tzm


Shouldn't be hard to update it to allow a second example row to be given so as to disambiguate. Alternatively, just expect the user to give a more perspicuous example of what they want. (I like that word, sorry.)


One can achieve the same in sublime text using the multiple cursors and edit feature. This is great for non tech people.

For those of you who are wondering what sublime text can do, do give a visit to a sublime text video series on tutsplus, its awesome and teaches you the power of sublime text


One can also achieve it in emacs using the multiple-cursors package. https://github.com/magnars/multiple-cursors.el

But I wouldn't necessarily call emacs "great for non tech people".

(I suppose it's tiring for people to keep pointing out "yeah, emacs can do this too". Sorry.)


This same example done in Sublime: https://www.youtube.com/watch?v=90uUdHyAACY


For a second I thought this is some kind of a data mining/ML/search tool that could transform

USA, Barack

Germany, Angela

to

USA, Obama

Germany, Merkel

based on a single example. Do this please. :)


There is a bug with multiword data points:

  Input:
  Bogdan, "Yucca"
  Josy, "Orange County"
  Bill, "San Diego"
  
  Example:
  Bogdan lives in Yucca

  Output:
  Bogdan lives in Yucca
  Josy lives in Orange
  Bill lives in San


This is neat. I find myself wanting more detail on what works, though. For example, I c/ped your original example and tried "Roberto C. from Brazil."

It didn't infer that C. meant to truncate the last name, so everything ended up "John C." No biggie, but trying to figure out what does and doesn't work aside from tokenized string formatting was a bummer. Having the uppercase example led me to believe it could do more types of transformation.

Possible the right answer is hinting of some kind. "Roberto {C.} from Brazil" to hint that the C. should be matched with -something-, and since . naturally means abbreviation would mean "starts with C".


I made a list of restaurants:

Input:

  taco bell 1
  mcdonalds 2
  wendys 3
  bojangles 4
  dairy queen 5
Ex: 1. Taco Bell

Output:

  1. Taco Bell
  . Mcdonalds 2
  . Wendys 3
  . Bojangles 4
  5. Dairy Queen
? When I take the spaces out of the restaurants I get:

New Input:

  tacobell 1
  mcdonalds 2
  wendys 3
  bojangles 4
  dairyqueen 5
Ex: 1. TacoBell

New Output:

  1. Tacobell
  2. Mcdonalds
  3. Wendys
  4. Bojangles
  5. Dairyqueen


Avi Bryant demo'd this exact concept at an old CUSEC: https://vimeo.com/4763707#t=27m20s

I wish I knew which paper he was referring to - it's a great feature :).


For more, I gave a feel for how to rethink the full data pipeline using these ideas @ Strange Loop: http://www.infoq.com/presentations/dsl-visualization . It pulls on several projects from program synthesis @ berkeley. (These directly led to applications mentioned here like flashfill.)


Any chance of making a plugin for Sublime Text?


If you're already in Sublime, just use multiple cursors. https://www.youtube.com/watch?v=90uUdHyAACY


That I could get behind. Often I have to use Sublime's regex find/replace to build my own data munging things that a plugin like this could solve.


I'd usually use 'perl -n -e' if I needed something like this. Not suggesting that perl is a better alternative but it's a reason why I'd never need to use that tool.

Here's the corresponding perl program using the same data and output as on the transformy home page:

  pbpaste | perl -p -e 's/(\d+), (\w+) (\w+), (\w+), (\w+)/@{[uc($3)]}, jersey number $1/'
To use that (pbpaste I think is a mac only feature) first copy / paste that into your terminal, then copy the list, then hit enter in the terminal.


This! I wanted this feature to be in every text editor for years :)). Like http://nimbletext.com but show and tell instead of expressions :).

Thanks!!!


This is what Vim macros are all about. You record the macro on a single item and repeat it on all the others.


Can I make a script out of the learned transform operation? Ideally it will be a function that I can paste into some script, and call map on each row.


I like the idea.

Does not seem to handle my main use case however (transforming a schema entry in Rails to a list of symbols).

Eg:

Input:

t.integer "Id"

t.integer "Active"

t.string "Email", null: false

t.string "CryptedPassword", null: false

t.datetime "created_at"

t.datetime "updated_at"

Example:

:Id

Output:

:Id

:Active

:Email

:CryptedPassword

:created

:updated


This is pretty neat. Since were asking for features: smarter date conversions. For instance on the input: '2015-04-24', and for example output: '26 April 2015'. The if another line has '2015-03-01' it would output '1 March 2015'. This seems like a somewhat difficult problem, but it'd be magical if it worked.


Anglo-centric only though, it would be impossible to handle non-English month names.


Typo on third example on main page: {name: 'Lennon', intrument: 'guitar'} should be instrument.


Great tool, but too bad it can't handle code very well.

An example (converting C++ to C):

    Input:
    void FileStream::write(char* , int);
    void VirtualMachine::cycle(int);

    Example:
    void FileStream_write(char* , int);

    Output:
    void FileStream_write(char* , int);
    void VirtualMachine_cycle(int* , );
Also, is it open-source?


TXR script:

  @(repeat)
  @type @class::@ident(@params);
  @  (output)
  @type @{class}_@ident(@params);
  @  (end)
  @(end)
Do this kind of thing regularly over C code, when it's too much for Vim macros.

I do it out of Vim. That is: first, select a range of text, then pipe it out:

  !txr some_transform_script.txr -
Done. For example, when adding functions to TXR Lisp's library, I start with a declaration like:

  static val foo(val x, val y);
I pipe this through a script which will produce this:

  reg_fun(intern(lit("foo"), user_package), func_n2(foo));
("Intern a symbol called "foo" in the user_package, and register a two-argument function object with this symbol, hosted from the C function foo.")

It's a little complicated:

  @(deffilter sym ("_" "-"))
  @(collect)
  @  (cases)
  @/(static )?/val @fun(void);@(bind arg nil)
  @  (or)
  @/(static )?/val @fun(@(coll)@{type /[^ ,]+/} @{arg /[^ ,)]+/}@(until))@(end));
  @  (end)
  @(output)
    reg_fun(intern(lit("@{fun :filter sym}"), user_package), func_n@(length arg)(@fun));
  @(end)
  @(end)
I filter underscores to dashes, because I want a C function like foo_bar to look like foo-bar in the Lisp dialect. The (void) argument list is handled as a special case.

I need to parse the arguments because the output part needs to know how many there are. Note how "func_n2" is generated, where the 2 comes from the argument count.


This is essentially the idea behind Warp, which can do it on large data sets and databases (https://pixelspark.nl/2015/warp-a-query-by-example-analysis-...)


I'd love to see an embeddable version of this, or an API. It'd be awesome to embed this into our CMS.


Feature request:

Output javascript/python/regex/whatever that performs the transformation between the two lists.


On a similar note: anyone know of a tool for generating SQL queries by example?


Warp does it under the hood (you can do query by example directly on databases). See https://pixelspark.nl/2015/warp-a-query-by-example-analysis-...


Did you know the name "Query by example" was devised for a tool to generate SQL queries?


Both SQL queries and Regex expressions by example would be amazing.


Regex by example seems to be what this tool is doing. Maybe OP/author could add an option to view the constructed expressions?


Great use for quickly cleaning up stuff without looking into the `sed` options.


The last example with JSON replace is really nice, I do this regularly with regular expression find-replace with groups, on larger datasets. I guess i can forget about regular expressions now. Nice work!


I'd be keen to use something like this as an offline library. Is there anything similar that exists out there for auto-detecting data formatting and structure, but as a library instead?


I like it!

I found that it doesn't like "=" in the transformation result.


I also noticed this:

  foo, bar
  baz, bog

  first="foo", second="bar"

  first"foo", second"bar"
  first"baz", second"bog"


Will look into that. Thanks for the feedback!


How is this done ?

Do you need to use some fancy AI or machine learning algorithm ?

Looks pretty mind blowing.

Mad respect.


Thanks a lot!

It's pretty basic at the moment, but we'll be launching smarter and smarter versions as we go.


This is very cool. Are there any restrictions on using the API?


I tried a list of unicode characters and their codepoint numbers, but it doesn't seem to recognize the unicode characters probably. Perhaps a normalization issue?


Really great. Manipulating lists is the hardest thing in the world for common people, while for any programmer it is really easy. This should even the things a little.


Not sure that this is doing what it's supposed to. This series:

11/1/2008, 12/1/2008, 6/1/2009

transformed to this:

November 1, 2008, November 1, 2008, November 1, 2009


We don't interpret the data, we only change the format. Since "November" doesn't literally match anything on the first line, it is repeated for each one of them.


It didn't work for me either. I guess the someone's friends are downvoting people who find flaws in it!


See my other comment.


Sometimes, I have to use a bunch of numbers in an SQL query and surround them with single quotes and a comma. Seems a good fit, thanks.


Another bug:

Adding a middle initial to the first player (Carlos), but no other lines, and I typed in "Carlos plays soccer" results in:

Carlos plays soccer

baseball plays USA

basketball plays USA

soccer plays Japan

Cool idea though.


I attempted to extract postal codes from csv lines that contained addresses (as well as some unrelated info). Complete failure : (


I've been using regular expressions in Notepad++ for doing this kind of things. This is awesome and would save a lot of time.


It would be cool if I could provide an xml file as input, and use the example to generate an xpath query quickly.


There is a bug with the tool, I have list data in this format (modified data for privacy)

abc, The Institue of

xyz, The school of

nbc2, The college of

jor5, School of

and if I try to extract just the name of the person it gives me the following

abc

xyz

nbc

jor

It's missing the numbers


I think Makefile have such pattern matching, which can be utilized to do such list transformations too.


In chrome it doesn't work. It never seems to update in real time, only at random long intervals


We're having some problems because of the traffic (we didn't expect reaching the front page) it should be smoother in a little while.


Ah, so it does the calculations on the server? My bad, I assumed it was calculating everything real time on the client. What is the server written in?


Backend is ruby. Should work again now!


OK! And now what are we supposed to do with sed, tr, grep, etc?? ;-)


Who made this? I want to know who I'm giving mad respect to. :)


What is the monetization strategy? I'm curious.


Any command-line version of this? API? )

PS: A very nice tool!


This would be useful as npm module.


for this list:

  tennis 3 4 5
  wilson 1 2 3
  robert 5 6 7

some patterns that didn't work:

tennis nis

tennis 12


loool @ tennis nis


not sure why it was downvoted, you could potentially chain inferences like:

tennis 12 even


Useful if Emacs is unavailable.


And also useful for people never used Emacs :)


Yep. This is great. Well done.


any Emacs tool libraries that will mimic?


goe bezig manne!


"We're impoving transformy"

Start by improving your proofreading :)


Thanks for catching that! :)


[deleted]


This is expected behavior. I'm assuming you gave "a 1" as an example? Since your first line didn't contain the number 1, it is repeated over every line.


All this does is some fancy search and replace. I don't know why it is amazing at all. Using awk is almost as effective.


Is anyone else bothered by the fact that 5 years ago, this would have been a free command line tool? But nowadays it's a closed-source web app instead?


The audience who are likely to make the most use out of a tool like this are not the same as the audience who would be comfortable using a command line tool.

I mean, you can replicate the core functionality of this fairly easily using awk, and if you're happy doing a bit of piping to perl or whatever, the fancier time re-formatting stuff is also easy.

In essence, the complexity in this tool (and what makes it cool) is the figuring out what you are trying to do without telling it - if you can run a command line tool you can tokenise the input yourself and you're most of the way there already.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: