Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Lume – automate data mappings using AI (lume.ai)
76 points by nmachado 10 months ago | hide | past | favorite | 26 comments
Hi HN! I'm Nicolas, co-founder of Lume, a seed-stage startup (https://www.lume.ai/).

At Lume, we use AI to automatically transform your source data into any desired target schema in seconds, making onboarding client data or integrating with new systems take seconds rather than days or weeks. In other words, we use AI to automatically map data between any two data schemas, and output the transformed data to you.

We are live with customers and are just beginning to open up our product to more prospects. Although we do not have a sandbox yet, here is a video walkthrough of how the product works: https://www.loom.com/share/c651b9de5dc8436e91da96f88e7256ec?.... And, here is our documentation: https://docs.lume.ai. We would love to get you set up to test it, so please reach out.

Using Lume: we do not have self-serve yet. In the meantime, you can request full access to our API through the Request Access button in https://www.lume.ai. The form asks for quick information e.g. email so that I can reach out to you to onboard you. Please mention you came from HN and I’ll prioritize your request.

How our full API product offering works: Through Lume’s API, users can specify their source data and target schema. Lume’s engine, which includes AI and rule-based models, creates the desired transformation under the hood by producing the necessary logic, and returns the transformed data in the response.

We also support mapper deployment, which allows you to edit and save the AI generated mappers for important production use cases. This allows you to confidently reuse a static and deterministic mapper for your data pipelines.

Our clients have three primary use cases

- Ingest Client Data: Each client you work with handles data differently. They name, format, and handle their data in their own way, and it means you have to iteratively ingest each new client's data.

- Normalize data from unique data systems. To provide your business value, your team needs to connect to various data providers or handle legacy data. Creating pipelines from each one is time consuming, and things as small as column name differences between systems makes it burdensome to get started.

- Build and maintain data pipelines. Creating different pipelines to that map to your target schema, whether for BI tooling, downstream data processing, or other purposes, means you have to manually create and maintain these mappings between schemas.

We're still trying to figure out pricing so we don't have that on our website yet - sorry, but we wanted to share this even though it's still at an early stage.

We’d love your feedback, ideas & questions. Also, feel free to reach out to me directly at nicolas@lume.ai. Thank you.




Best wishes from qarl, co-founder of Lume (1998)

https://web.archive.org/web/19981201053816/http://www.lume.c...


wow! It's a pleasure, qarl :) Reach out anytime.


Congratulations on the launch! I’d love to see something like this succeed. However, there are some challenges that you might have to overcome. Here are some random thoughts:

Ingest Client Data - You will have to find customers who ingest dynamic data schemas. The example in the video shows more of a standard schema, which can be mapped once (using Lume or otherwise). No need to add an overhead or extra cost to run that data pipeline.

Normalize Data - One of the challenges will be to establish quality metrics for these mappings. Based on the demo, the quality score is supposed to be 100% all the time, but that’s far from the truth. Real data is messy. Validations will catch a lot of issues, but there will still be cases where incorrect mappings slip through. Ability to provide metrics around this will be very helpful in adoption.

Response Time - I’m not sure if you’re using OAI or your own models in the background. Even small latency in the pipeline for 100s of millions of records adds up to hours/days delay.

All the best!


So could your AI automatically create a data flow that solves one of the 'Advent of code' problems, such as:

https://adventofcode.com/2023/day/1 https://adventofcode.com/2023/day/2


I guess not...


We haven't had bandwidth to run it on this example yet, but will report back when we do!


Hey, I looked at the solution, Seems like you are using JSON forms to validate the response from LLM and iterate it until it matches the response. Again, I don't see why would I use this in my app as the API call will increase the latency and I can use the same method to create Rust or Go code from GPT4 or other code based LLM's. Am I missing something here?


We actually leverage LLMs very sparingly. We do not generate the transformed data directly as this would not just introduce significant latency, but also quality and reliability issues. Rather, we use LLMs to produce high-level mapping/transformation logic in a language of our design that is deterministically executed to produce your desired data. So this means LLMs are used only when you introduce new data formats that requires new logic, and used surgically even then. The vast majority of usage so far in terms of volume leverages the logic created in the underlying pipeline and will not have latency issues. This also allows for building reliable and stable pipelines with our APIs, which is a requirement that's difficult to meet considering the non-determinism of LLMs.


Who is your target customer? I periodically have to do this sort of mapping so it seems helpful to me, but typically they're one-off things so I can't justify purchasing a product.


Our customers have a recurring need to map data and use those mappings, such as onboarding client, normalizing data from multiple systems, or build and maintain data pipelines. All of them have the common denominator of continually having to map and maintain these mappers (e.g. n clients). You can learn more here https://www.lume.ai/use-cases. Of course, if we may be of help, reach out to nicolas@lume.ai


Hey Nicolas! best of luck on the product, you gave me a demo a while back and it's excellent - excited to see what comes of it.


Thank you!


Best of luck Nicolas. For occasional complex data mappings, how does Lume adapt? Can users customize for specific one-off tasks?


Customers can use Lume to do one-off mappings along with recurring mappings. As seen in the demo video, you can create a mapping between any source and target schema. So, the workflow of using Lume is the same for both one-off and recurring cases. Was this what you were referring to?


So this is like Flat File but also for APIs?


Great question. We focus on embedding in your data pipelines themselves. So, our AI automatically maps data, and can be used as a data pipeline indefinitely. Indeed, it can connect to APIs and handle dynamic output or edge cases you did not expect. Also, we work on handling any complexity of transformations (1-1 mappings, all the way to string manipulation, classification, aggregrations, etc).


Hi, could you please roughly explain how do you verify that transformation successful and correct?


Yes! Once the transformation job has been completed, you can review the mapping in the returned job payload and our Lume dashboard. You can review, edit, and deploy the mapping pipeline from the dashboard. There are two ways to fix mappings. You can edit the target schema (e.g., make a required target field nullable) or manually override our mapping by giving the correct mapping value from the source data. I have also attached a Loom video showing this workflow: https://www.loom.com/share/95e47ead923d4911b647456174142e00


These are all just openai wrappers with a nice ux, better to build your own prompt and go straight to the source. More visibility into errors/edge cases and ability to leverage new model capabilities as they come out. Whats more, as your use case gets more complex, you will outgrow these apis. You could have just written your own prompt to begin with and added edge cases as they arose


Thanks for the comment. We get this question often, and our most common answer is that our customers tried to solve their mapping problems using OpenAI calls. After it did not yield the expected reliability, consistency, and scalability results, they ended up onboarding to Lume. We’ve put extensive work into the overall engine (and the underlying AI) to the point where customers have seen it as a significant value add over their attempts at leveraging OpenAI directly.


Are your transformations written in SQL?


They are written in python but SQL is in the roadmap.


Check out generating ibis, which can output SQL and many dataframe formats (pandas, polars, modin,...)


Sounds very useful, thank you for sharing.


The animation on the homepage puts my processor to 100% (Firefox browser). I know that only an UI annoyance, and not really product feedback, but it made me close the browser tab faster than usual and other users might, too.


Thank you for shouting this out! I'll look into getting a smaller version in there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: