Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Datrics (YC W21) – No-Code Analytics and ML for FinTech
44 points by avais 6 days ago | hide | past | favorite | 10 comments
Hey everyone, we're Anton (avais), Kirill (Datkiri), and Volodymyr (vsofi), the founders of Datrics (https://datrics.ai). We help FinTech companies build and deploy machine learning models without writing code.

We provide a visual tool to work with structured data by constructing a diagram of data manipulations from lego-like bricks, and then execute it all on a backend. This lets our users accomplish tasks that usually need a team of software engineers, data scientists, and DevOps. For instance, one of our customers is a consumer lending company that developed a new risk model using just our drag-and-drop interface.

I used to lead a large data science consultancy team, being responsible for Financial Services (and Risks specifically). Our teams’ projects included end-to-end risk modeling, demand forecasting, and inventory management optimization, mostly requiring combined efforts from different technical teams and business units to be implemented.

It usually took months of work to turn an idea into a complete solution, going through data snapshot gathering to cleansing to experimenting to working with engineering and DevOps teams to turn experiments in Jupyter notebooks into a complete application that worked in production. Moreover, even if the application and logic behind the scenes were really simple (could be just dozens or hundreds of lines of code for a core part), the process to bring this to end-users could take ages.

We started thinking about possible solutions when a request from one of the Tier 1 banks appeared, which confirmed that we’re not alone in this vision: their problem was giving their “citizen data scientists” and “citizen developers” power to do data-driven work. In other words, work with the data and generate insights useful for business. That was the first time I’d heard the term “citizen data scientist”. Our users are now these citizen data scientists and developers, whom we’re giving the possibility to manipulate data, build apps, pipelines, and ML models with just nominal IT support.

Datrics is designed not only to do ML without coding, but to give analysts and domain experts a drag and drop interface to perform queries, generate reports, and do forecasting in a visual way with nominal IT support. One of our core use cases is doing better credit risk modeling - create application scorecards based on ML or apply rule-based transactional fraud detection. For this use-case, we’ve developed intelligent bricks that allow you to do variables binning and scorecards in a visual way. Other use cases include reports and pivot tables on aggregating sales data from different countries in different formats or doing inventory optimization by forecasting the demand without knowing any programming language.

We’re providing 50+ bricks to construct ETL pipelines and build models. There are some limitations - a finite number of pre-built building blocks that can be included in your app, but if there is no block that you need, you can easily build your own (https://youtu.be/BQNFcZWwUC8).

Datrics is initially cloud-native, but also can be installed on-prem for those customers who have corresponding security policy or setups. The underlying technology, the pipeline execution engine is rather complex and currently built on top of Dask, which gives Python scalability for big datasets. In the next release, we are going to support Pandas as well as to switch intelligently between small datasets for rapid prototyping and big datasets for pipeline deployments.

We’re charging only for private deployments, so our web version is free: https://platform.app.datrics.ai/signup. Try it to create your analytical applications with a machine learning component! We've put together a wiki (https://wiki.datrics.ai) to cover the major functionality,

We are super-excited to hear your thoughts and feedback! We're big believers in the power of Machine Learning and self-service analytics and are happy to discuss what you think of no-code approaches for doing ML and analytics generally as well as the availability of them for non-data scientists. Or anything you want to share in this space!

Because financial data generally means time-series, autocorrelations abound, and it becomes very easy to develop a model that underperforms naive baselines (e.g. LOCF, arima) or peeks into the future through improper cross validation or feature engineering.

If the customer gives you a column that peeks into the future (e.g. "quarterly sales" when each row is a sale in that quarter), you'll build a model that looks great on metrics and to the customer, and might take months for the customer to realize was practically useless. Are you able to reliably prevent these kinds of issues at a technical level, or do you lean towards customer education ("don't give us quarterly sales") instead?

Great question, and we've seen this happening actually, when some payments-based features used in scoring, data leakage. We are not doing "hard" constraints on this, but there are 2 ways for the user to identify this: first one is feature importance built automatically, so the user will see these features immediately second one is the process of building inference. The same data prep/transformation pipeline assumed to be used as API or for batch processing before model application, and at this point they will definitely realise that they just don't have these features yet for new customers!

Congratulations on launch! ML tools/services/platforms are really hard to build. You need to juggle not only frontend/backend frameworks but also make them work with ML frameworks. There are so many corner-cases that can make the whole app crash.

How does it differ from open-source AutoML frameworks like https://github.com/mljar/mljar-supervised or drag-and-drop tools like Azure ML Studio?

Is the no-writing the code a killer feature here?

Do you have financial data enhancement feature? Do you plan such feature?

Re killer feature: I believe it's being end-to-end, provide experience for "citizen x", domain experts, rather than only ML engineers: so it's no-code data cleansing, transformation + ML layer on top of this + productionalization, and being cloud-agnostic. We found that the problem that we're trying to solve is bigger than just giving the possibility to use ML instruments for non-developers. There are a lot of citizen data scientists who want to automate their spreadsheet work, make inference easy once the model is created without too much efforts to change processing pipeline, etc. That's why we don't compete with Azure ML Studio as they more developers-focused and with AutoML tools as ML is only a feature of platform.

In addition, we've designed and implemented own pipeline scheduling algorithm and APIs out-of-the-box, so ready pipelines or models could be easily embedded into business process or used outside of the platform.

Right now there is a possibility to integrate third-party data with custom code + API calls, but I believe we need to work on this extensively to provide easier way for this. So happy to learn if you have some specific sources in mind?

yeaaah - we've experienced corner cases problem a lot. We're covering the functionality with unit tests and based on mock data, but still validating many things manually and sometimes need to do hot-fixes :) Though, the good side of this is that the users don't need to this testing on their end once building on top of datrics, so we (hopefully) save some time here.

What makes this specific to fintech vs. general ML tools? I led risk at a fintech and it’s unclear how this is better than generalized solutions.

We're building templates (pre-built pipelines) for particular use cases, so that it's easy to start. We have one for credit risk specifically! In addition, some functionality tailored for Risks: binning of the variables based on information value, good/bad with WoE stats, building a scorecard based on odds of min-max scaler directly from LogReg Model. Does it make sense?

Do you have any integrations with platforms that a fintech might be using (e.g. TSYS)?

Not yet. I think integrations is our weak part at the moment (we support just csvs, json, SQL DBs - MySQL, MSSQL, PostgreSQL) and building custom integrations based on customers needs. We're going to extend the list of standard connectors in the next few months, under the discussion with our clients and prospects on priorities.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact