Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Investorsexchange.jl – parse trade-level stock market data in Julia (github.com/lukemerrick)
136 points by lukemerrick on Aug 31, 2022 | hide | past | favorite | 63 comments
Backstory:

I wanted to play with intraday stock data but couldn't find a free dataset anywhere. IEXCloud [1] offers API access to 1-minute granularity intraday historical price data, but I was worried that it could get expensive or unwieldy to build up a substantial dataset via API calls. Plus, IEX gives out their raw data for free.

I probably should have just used the IEXTools python library [2] to parse IEX's raw data dumps, but I was working on a Julia project, so it felt more thematically appropriate to build a new tool from scratch.

I haven't been actively using InvestorsExchange.jl a lot lately, but it's made me the proud owner of a 50GB SQLite DB dump covering several years of trade data, and I think it would be awesome if I could help folks in the HN community more quickly build up this kind of dataset for their own curiosity or research.

Feedback is also greatly appreciated!

[1] https://www.iexcloud.io/docs/api/#historical-prices

[2] https://github.com/lvfrazao/IEXTools




Investor Sex Change?


HN is special, and you all here in this comment tree are the best of the best -- love the fact that everyone is having fun with this in such a civil way and historically knowledgeable way.

For anyone who wants the naming backstory, InvestorsExchange.jl was originally IEXTools.jl, but Julia's package registration automatic name checks didn't like it ("Name does not meet all of the following: starts with an uppercase letter, ASCII alphanumerics only, not all letters are uppercase. Name is not at least five characters long") [1]. So to Wikipedia I went to find the non-acronym name of the IEX exchange, which is "Investors Exchange" [2]. Thank you all for helping me understand why IEX goes by IEX in all of their branding.

[1] https://github.com/JuliaRegistries/General/pull/27989

[2] https://en.wikipedia.org/wiki/IEX


Domain names of the old internet:

- penisland.net (sells pens)

- expertsexchange.com (renamed to experts-exchange.com)

- therapistfinder.com (find a therapist)

- whorepresents.com


Not just the old internet! Ever considered visiting the Sierra Nevadas? Well there's a wonderful, outdoorsy place in California that's just waiting to welcome you: Lake Tahoe! Visit, uh, gotahoe.com, and keep in mind that the Nevada border, where certain things unrelated to outdoor sports are entirely legal, is right there.



Sort of? "Tahoe" isn't the problem. Sure, "hoe" is a substring of "tahoe" but it's pretty clear that "tahoe" is a placename in a string like "visitlaketahoe.com". IMO they absolutely knew what "gotahoe.com" would be parsed as, it's very clever. It'd be a Scunthorpe if the site was getting flagged for "hoe", but it's a parsing problem at the user layer, not censorship elsewhere.


Yeah, it's more a "CU in the NT" https://ntunofficial.com


> they absolutely knew what "gotahoe.com" would be parsed as, it's very clever

I'm not so sure :) goblah.com is one of the most popular tourism domain configurations that exist.


My employer did some work early in Dotcom for a company that owned the domain name "manufacturersexchange", when I pointed out that was a terrible name for a business they said "no business person would notice that". Business never got off the ground.


I mean even with a dash it's really not a very good name for a business. What does the company even do?

Mind you, self-descriptive company names and domains are more useful for local small businesses; big ones like Google start to get their own meaning. ycombinator? What?


They wanted to start a manufacturing equipment business like eBay, but for factories, which itself was rather silly given how difficult shipping machines would be. They wanted a preliminary design for a 3 month period, but took 3 months to negotiate the contract, so in the end I worked without any input. I think they were completely clueless about the internet or how difficult such a business would be to run, or even market.


Some businesses really can benefit from a good or memorable domain name.

But that market is much, much smaller than it used to be, and most businesses can live or die no matter what their domain name is.


I wonder what their argument would have been. "Business people just aren't observant enough to notice things in general shrug" ?


Based on their logo I'm pretty sure Pen Island knows exactly what they're doing.


Your pen is our business


Missing a comma after 'is'


The penisland website uses a line break:

Your pen is

Our business!


"We Specialize In Wood"


Looking at the FAQs they seem to know their business.

Q: Can I provide my own wood? A: In most cases we can handle your wood. We do require all shipments to be clean, free of parasites and pass all standard customs inspections.


I remember one called penismighty.com. Which was a silly online community similar to, well nothing, but I guess the closest thing would be somethingaweful.com it even had alternative themes to the site which would lean into the different interpretations of it's name which was awesome.


slutsofinstagram.com

> Welcome to Slütsof In Stagrâm an online fantasy series about a Princess duck named Slütsof and her adventurous journey across the mystical lands of Stâgram. Join her as she leaves her castle, searches for her long lost brother Brösof The Magical Goat and challenges a tyrant King who has claimed all of Stagrâm for himself. Here is some concept art to ignite your imagination! --New submissions below!

(The author of the site originally registered the domain for other reasons. Instagram tried to take legal action so they pivoted and back solved the purpose of the domain and why it wasn't infringing on the Instagram trademark)


- rim.jobs (Research in Motion (Blackberry) job postings)


dicks.com was previously NOT Dicks Sporting Goods...


Visiting all these cracked me up, thanks


I entered this post knowing I would upvote this comment.


I too am here for this reason also. I would change the name. It is an issue. Put a dash in it or an underscore... just something.


You would be surprised by just how many investors are looking for this type of surgery...many many people.


Underscore is (was?) illegal in domain names.


It's not illegal in domain names. Take a look at the SRV type in the domain name system. It's underscores galore.

However, for host names or for a URL's host component, you're limited to hyphens.

In this case though it's just the name of a Julia package[1], which isn't bound by either the rules of domain names or host names as .jl is not a TLD.

[1]: https://pkgdocs.julialang.org/v1/creating-packages/#Package-...


https://www.theregister.com/2001/03/16/transsexuals_drawn_to...

Takes me back to the days of ExpertsExchange, the biggest predecessor to Stack Overflow. They were essentially forced to change their domain to Experts-Exchange.


Like my other favorite play on words like that - the famous “Pen island”


Yes, I'd also say that the name/domain is a bit unfortunate.


I am pretty sure it's a deliberate inside joke from expertsexchange days of yore


Quite a unique niche, I'd say go for this market.


They must use this at HRT


High-f(r)equency Trading


he was definitely successful in raising some eyebrows, peaking people's interests and ultimately got a lot of clicks!


Oh, good- I wasn't the only one.


They don't allow you to have jokes in here.


Nice, very nice!


I am here for the comments only. :)


One file has a size of nearly 5 GB... Every day two of these files get released... So nearly 10 GB every day.

So if we download the raw data every day for one year, we would have 3650 GB just for one year...

It would be interesting how much reduced the size of the processed data is compared to the raw data. You say that you have 50 GB of data spanning multiple years. How many years exactly?


Ah, this is a nuanced point I totally left off the README. Each raw file is ~5GB, but the raw files are a dump of network traffic from the firehose feed that tracks not just trades, but also updates to orders that do not result in trades. If you skip all of the algorithmic bots' constant updates to their bid/offer spreads and look at just the trades that clear, you can store years of raw trades in under 100GB.


Yes, this type of datasets are massive. We use TAQ for research work but almost never use the raw data as is.

> So if we download the raw data every day for one year, we would have 3650 GB just for one year.

A small correction - the stock market is open only about 250 days a year so with your calculations the raw data size will be 2500 GB. Still massive.


It's good that the actual file is InvestorsExchange.jl, because without the E capitalized, that's definitely not how my brain parses it.


"Invest or sex change" is how my brain initially parsed it, lol. Was your initial reaction the same?


Is that a threat?


Is it an opportunity?


When you say they give out the raw data for free, does this mean you can download a batch of it? Or are you referring to the free API for the 5 years/30 day data on the bèage you linked?


Look at this guy, not commenting about the name.


Referring to these gzipped daily (two formats) dumps I think: https://iextrading.com/trading/market-data/

API only needed for live (incl. past day) and more convenient granular access/computations it seems.


Reminds me of the time the company that owns BlackBerry decided to post some listings on the new-at-the-time .jobs TLD...


I interned at RIM back in the day and every single person I went to school with asked how I liked my RIM job.


Maybe that's why they were single.


I’ll get your coat.


When looking at the linked archives [1] it seems that they sit at around 4GB/day. Is that really all of the data that the US stock market generates a day? I thought it would be at least 10x that.

[1] https://iextrading.com/trading/market-data/


Its only data for flows going through IEX, less than 3% market volume

https://iextrading.com/stats/


amazing domain name


It's not a domain name, there is no .jl TLD


ah yea, but investorsexchange.com is availiable :D


I'm surprised it's for sale and not already acquired by the guys at IEX. Given their trademark they are the only likely buyers so I assume the current owner has been holding out on them for a better deal.


I want to see this pitch deck


I upvoted this just to keep the name on the front page longer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: