Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Database for analyzing US companies, visualize using Apache SuperSet (tesseractanalytics.ai)
113 points by tessbi on April 19, 2023 | hide | past | favorite | 44 comments
My main motivation was that I wanted to be able to drill down and filter across all the available stocks, look at the data for myself, and narrow down on the stocks I am interested based on my own sets of criteria, and make data-driven analysis for my personal investment strategies.

I used PostgreSQL as the backend database for ELT data pipelines, and used Citus Data cstore_fdw for columnar compression for the final dataset. All financial data is coming from SEC Edgar, https://www.sec.gov/developer. I used Python for downloading most of the data.

I also run the data load development locally on my home Ubuntu server that I built 5 years ago. I bought 4TB of M2 disks for best database performance, with PRIME B360M-A motherboard and Intel Chip Coffee Lake S.

I built the website simply using WordPress, and I run Apache Superset using gunicorn via Apache Webserver reverse proxy.

The registration form I had to build myself with PHP and some JavaScript, because it needed to automatically create a SuperSet user upon registration. Otherwise, I would need to input everyone manually. I used Python again for the data integration.

Please don't use the database directly as an investment tool, as its in Beta, and the data still needs to undergo heavy data quality checks, please confirm all the numbers yourself, as I provide a link for every company to the SEC filings.




It's unusual to require or even ask for personal information joining name, email, and phone number to register, especially for a free site from an untrusted source. There's no way I'm putting my information in there.


I am deeply offended that you do not trust me, a stranger on the internet! I am hearing that many people feel that I am asking for too much data. I removed the phone number being necessary for registration. But I did still want to keep the email verification step, so that I can know that the people using my database are actual people and not bots/trolls, and that I could reach them with new features, and establish a professional relationship going forward. I hope that simply giving me your name and email, is not too much to ask for going forward. When I meet people at any event, it is often customary to exchange some information to stay in contact. Cheers!


Interesting! Whenever I am in a more active investing phase I feel like I don't have the data available I would like to have for making decisions.

I am getting 'Database error: Unknown error' when trying to run a query. Maybe too much load at the moment.

Regarding the sign-up form: You expect the input to follow the format '888 888 8888' including blanks. I would consider dropping the phone number altogether or at least allow other formats.


Thank you for the feedback! I dropped the phone number as being necessary for registration. If you still have issues registering, please email me: alex@tesseractanalytics.ai


Account creation failed because I chose a user name that already existed.

When I changed the username, account creation failed because my email and phone were already taken. I guarantee they aren't, I think the system actually registered me the first time and I'm now totally locked out from using that email.


Oh no!!! I am so sorry to hear this, I will work on fixing this. If you are still interested in trying out the data, please email me: alex@tesseractanalytics.ai. I can register you manually.


I have no use for this but I've gotta say this is pretty awesome, great job!


Thank you so much! This has been my project for the past 3 years.


For most questions, please look through the tutorial videos and documentation that I created! :)

https://tesseractanalytics.ai/data-tutorials/

https://tesseractanalytics.ai/finance-data-quarterly-data-do...


This seems like your going down the rabbit hole into an area where only the bigs can compete, its also a fools errand in my opinion.

There's a growing sentiment that the stock market is rigged, and I've seen several instances where that almost certainly seems to be the case and no action has been taken. It started with the COMEX eligible silver and gold holdings which have never been independently audited.

Then GME and FRC more recently.

Once you realize the mechanisms that allowed those to be successful, you realize they can do it with any publicly traded company, and its no longer an investment because the fundamentals have nothing to do with it. You can cause companies to fail simply by causing aggregated indebtness violations. Synthetic shares can be created indirectly which let you create more selling or buying than exist in the float. There's only one way the the basic mechanics say happens when that exists.

Its a casino, no longer a real investment and any company with better funding alternatives won't go public if they intend to stay in the business for longer than a few years. Most of the companies are zombie companies which are overvalued. Some are undervalued. Its not a good business to be in, there's some real evil schmucks playing games with information that isn't public.


This is cool! Signed up, your phone number part of UI seems unnecessary to even collect and the UI is bad


Got it! I removed the phone number as necessary.


This is very cool! Would be amazing if you could pass in data in a certain format (like the SEC data) from non-US companies as well and view/compare data in the same way.


That would be in the future for sure, US data is just much easier to work with right now.


How often is the database updated? For instance on stock price data can it be considered realtime (or 15 min lag) for daily market tracking? Not familiar with that SEC api.


The data is refreshed nightly for stocks. And monthly from SEC once it is available to download. There is no guaranteed data delivery SLA that I know about from SEC, so I get it monthly once it arrives. I talk more about this in detail on the website, I will also update the data documentation tab, and in tutorials. https://tesseractanalytics.ai/finance-data-quarterly-data-do...


data in edgar is usually available next business day after company filed documents.


Yes, but they do not necessarily offer it as a dump right away. So I am still relying on monthly dumps.


they do, here are daily files with references on filing for Apr: https://www.sec.gov/Archives/edgar/daily-index/2023/QTR2/


Thank you! Yes, but I am using their monthly dumps for now, because it was easier for data warehousing. I will put it in my plans to pull data daily incrementally in the future as a next phase.


Is it just me or this website is completely broken on the desktop?

EDIT: hold off on that, might be my company intruding on my connection.


I looked at superset a few years ago and looked great but not popular enough to trust it. Seems the same state now?


Well its a top-level apache project nowadays.


I am not sure what you mean by 'trust it'? As in, does it display the data as it should? It is difficult to find free open source data visualization tools that can run in the browser, that anybody could log-in and use right away. For my personal analytics I use Tableau, connected to the same dataset.


I meant that it wasn't popular enough to use it because it might be buggy and/or wont be around in 5 years.


It got better and better and is still around.

Their base technology server side (Sql Alchemy/ Pandas) and client side (React Antd, Apache Echarts) is future proof.


I see. Well its not as slick as paid data visualization tools for sure.


For a tableau open source alternative that runs in browser, you can try graphic-walker: https://github.com/Kanaries/graphic-walker


Thanks for the tip! I will check it out!


Honestly somewhat buggy, specially the asynchronous and cached queries which were quite important to me.


Where do you get daily stock prices data from?


At a first Glance, SuperSet seems a bit limited compared to Grafana.


Grafana has been down the path of being a sell out for the parent company to drive towards their other offerings to make money.


I would respectfully disagree. Their pace of introducing new features & products, support for many datasources, plugins, etc. has been impressive over the past 3 years. This applies to all their versions (free OSS version, free Cloud, and paid Cloud).


What do you recommend instead?


I could try playing around with Grafana as well, thanks for the tip!


How can I log into the portal?


There is a registration form that automatically creates you SuperSet account. https://tesseractanalytics.ai/financial-database-registratio...

It also creates a Wordpress account for you if you wanted to blog about some discovery.


Is there a way to register without it also creating a WordPress account?


Yeah, I suppose I can remove that feature, I just thought if people wanted to add comments or blog on my tutorials. Just message me in Contact Us, I will remove you from WordPress.


That sounds like a good use case. How making it optional?


I can add a button for that.


nice work, how big is the dataset? @tessbi


Well the quarterly one is much smaller then say, the daily one. Select count(*) from superset.company_financials_quarterly; = 458,573. I have >6000 companies, with over 80 features/columns. I cannot post daily data through SuperSet, because people will query the whole dataset without filters, and it will be too slow. But I have it just in case people want the daily stock charts in the future. Cheers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: