More

Labo333 · 2024-12-20T10:16:03 1734689763

Sad that it is English only, not multilingual.

Labo333 · 2024-11-05T22:31:45 1730845905

Indeed, this is a major blocker. I am very wary of installing it and would never ship this to production.

Labo333 · 2024-08-11T16:14:24 1723392864

Wow, that's super interesting! Thank you I will look into it!

Labo333 · 2024-08-11T15:47:24 1723391244

Well I meant to make a beautiful poster at first. But you can see what cells were important at what stage of the game!

Labo333 · 2024-08-11T15:36:39 1723390599

Wow, front page!

The current visualization is by far not perfect but it was hard for me to put more information there. Please give your ideas to improve this visualization or make others!

I mostly aimed at making an aesthetically pleasant image that would represent what cells were controlled and moves were used the most.

As for usage examples, it's very easy to see the difference between European and Indian openings (the former advancing in the center and the latter on the sides) and it's quite easy to guess who won by looking at who controlled the most cells last.

On the tech side, this is a single-file, local first, vanilla JS app querying the (non official) chessgames.com API through corsproxy.io (because CORS). Then I draw using svg elements. Finally, I use canvg [^1] to produce png images. The js code is embedded in the HTML so you can read the code just by viewing the source (or look on github [^2]). I also have a Python version that I also maintain to produce the same outputs as the browser version.

[1]: https://github.com/canvg/canvg

[2]: https://github.com/louisabraham/chessviz

Labo333 · 2024-08-03T03:49:02 1722656942

I actually quit a quant trading job after 2 weeks because they used kdb+. I could use it but the experience was so bad...

People could complain about abysmal language design or debugging but what I found the most frustration in the coding conventions that they had (or had not), and I think the language and the community play a big role there. But also the company culture: I asked why the code was so poorly documented (no comments, single letter parameters, arcane function names). "We understand it after some time and this way other teams cannot use our ideas."

Overall, their whole stack was outdated and ofc they could not do very interesting things with a tool such as Q. For example, they plotted graphs by copying data from qStudio to Excel...

The only good thing was they did not buy the docker / k8s bs and were deploying directly on servers. It makes sense that quants should be able to fix things in production very quickly but I think it would also make sense for web app developers not to wait 10 minutes (and that's when you have good infra) to see a fix in production.

I have a theory on why quants actually like kdb: it's a good *weapon*. It serves some purpose but I would not call it a *tool* as building with it is tedious. People like that it just works out of the box. But although you can use a sword to drive nails, it is not its purpose.

Continuing on that theory, LISP (especially Racket) would be the best *tool* available as it is not the most powerful language out of the box but allows to build a lot of abstractions with features to modify the language itself. C++ and Python are just great programming languages as you can build good software on them, Python being also a fairly good weapon.

Q might give the illusion of being the best language to explore quant data, but that's just because quants do not invest enough time into building good software and using good tools. When you actually master a Python IDE, you are definitely more productive than any Q programmer.

And don't get me started on performance (the link covers it anyway even though the prose is bad).

wenc · 2024-08-03T07:02:36 1722668556

The article calls out Python and DuckDB as possible successors.

I remember being very impressed by Kdb+ (went to their meetups in Chicago). Large queries ran almost instantaneously. The APL like syntax was like a magic incantation that only math types were privy to. The salesperson mentioned KdB was so optimized that it fit in the L1 cache of a processor of the day.

Fast forward 10 years. I’m doing the same thing today with Python and DuckDB and Jupyter on Parquet files. DuckDB not only parallelizes, it vectorizes. I’m not sure how it benchmarks against kdb+ but the responsiveness of DuckDB at least feels as fast as kdb+ on large datasets. (Though I’m sure kdb+ is vastly more optimized). The difference? DuckDB is free.

singhrac · 2024-08-03T07:22:21 1722669741

We use DuckDB similarly but productionize by writing pyarrow code. All the modern tools (DuckDB, pyarrow, polars) are fast enough if you store your data well (parquet), though we work with not quite “big data” most of the time.

It’s worth remembering that all the modern progress builds on top of years of work by Wes McKinney & co (many, many contributors).

wenc · 2024-08-03T14:21:33 1722694893

Yes Wes McKinney was involved in both Pandas and Parquet and Arrow.

7thaccount · 2024-08-04T21:45:31 1722807931

I remember reading something awhile back that when building pandas he was getting a lot of inspiration from things like APL and I assume Kdb+.

wenc · 2024-08-03T17:15:18 1722705318

I just realized all the data tools I use are animals.

Pandas

Polars (polar bear)

DuckDB

Python

cout · 2024-08-03T09:19:17 1722676757

Do you use duckdb for real-time queries or just historical? You mentioned parquet but afaik it's not well suited for appending data.

wenc · 2024-08-03T14:41:15 1722696075

Also a tip: for interactive queries, do not store Parquet in S3.

S3 is high-throughput but also high-latency storage. It's good for bulk reads, but not random reads, and querying Parquet involves random reads. Parquet on S3 is ok for batch jobs (like Spark jobs) but it's very slow for interactive queries (Presto, Athena, DuckDB).

The solution is to store Parquet on low-latency storage. S3 has something called S3 Express Zones (which is low-latency S3, costs slightly more). Or EBS, which is block storage that doesn't suffer from S3's high latency.

eismcc · 2024-08-03T16:26:56 1722702416

You can do realtime in the sense that you can build Numpy arrays in memory from realtime data and then use these as columns in DuckDb. This is approach I took when designing KlongPy to interop array operations with DuckDb.

wenc · 2024-08-03T14:13:32 1722694412

Not real time, just historical. (I don’t see why it can’t be used for real time though... but haven’t thought through the caveats)

Also, not sure what you mean by Parquet is not good at appending? On the contrary, Parquet is designed for an append-only paradigm (like Hadoop back in the day). You can just drop a new parquet file and it’s appended.

If you have 1.parquet, all you have you to do is drop 2.parquet in the same folder or Hive hierarchy. Then query>

  Select * from ‘*.parquet’

DuckDB automatically scans all the parquet in that directory structure when it queries. If there’s a predicate, it uses Parquet header information to skip files that don’t contain the data requested so it’s very fast.

In practice we use a directory structure called Hive partitioning, which helps DuckDB do partition elimination to skip over irrelevant partitions, making it even faster.

https://duckdb.org/docs/data/partitioning/hive_partitioning

Parquet is great for appending!

Now, it's not so good at updating because it's a write-once format (not read-write). To update a single record in a Parquet file entails regenerating the entire Parquet file. So if you have late-arriving updates, you need to do extra work to identify the partition involved and overwrite. Either that or use bitemporal modeling (add data arrival timestamp [1]) and do a latest date clause in your query (entailing more compute). If you have a scenario where existing data changes a lot, Parquet is not a good format for you. You should look into Timescale (time-series database based on Postgres)

[1] https://en.wikipedia.org/wiki/Bitemporal_modeling

belfthrow · 2024-08-03T18:11:29 1722708689

Not surviving more than 2 weeks in a QF role because of kdb, and then suggesting they should rewrite everything to LISP is one of the more HN level recidivous comments I think I have ever seen.

dumah · 2024-08-03T13:08:00 1722690480

You didn’t learn Q in two weeks to the extent that you are qualified to assert that someone who knows how to use a Python IDE is more productive than a quant dev with decades of experience.

I find it much more likely that you couldn’t understand their code and quit out of frustration.

If you were a highly skilled quant dev and this was a good seat, quitting after two weeks would have been a disaster to manage the next transition given the terms these contracts always have.

Jorge1o1 · 2024-08-03T06:52:20 1722667940

Their pykx integration is going a long way to fix some of the gaps in:

- charting

- machine learning/statsmodels

- html processing/webscrapes

Because for example you can just open a Jupyter Notebook and do:

  import pykx as kx
  df = kx.q(“select from foo where bar”)
  plt.plot(df[“x”], df[“y”])

It’s truly an incredibly seamless and powerful integration. You get the best of both worlds and it may be the saving feature of the product in the next 10 years

nivertech · 2024-08-03T13:30:26 1722691826

I think this will only work with regular qSQL on a specific database node, i.e. RDB, IDB, HDB[1]. It will be much harder for a mortal Python developer to use Functional qSQL[2] which will join/merge/aggregate data from all these nodes. The join/merge/aggregation is usually application-specific and done on some kind of gateway node(s). Querying each of them is slightly different, with different keys and secondary indices, and requires using a parse tree (AST) of a query.

---

[1] RDB - RAM DB (recent in-memory data), IDB (Intraday DB - recent data which doesn't fit into RAM), HDB - Historical DB (usually partitioned by date or other time-based or integral column).

[2] https://code.kx.com/q/basics/funsql/

Jorge1o1 · 2024-08-03T13:50:04 1722693004

That’s accurate enough. I think the workflow was more built for a q dev occasionally dipping into python rather than the other way around.

I think you touch on something really interesting which is the kink in the kdb+ learning curve when you go from really simple functions,tables, etc. to actually building a performant kdb architecture.

qkdb1 · 2024-08-08T10:32:37 1723113157

Will be interesting to see what comes of some of the things that are being put on their roadmap https://code.kx.com/pykx/2.5/roadmap.html#upcoming-changes seems to be moving in a direction of an API similar to Polars

keithalewis · 2024-08-03T07:26:39 1722669999

[flagged]

eru · 2024-08-03T08:12:39 1722672759

It's not a good filter in that case. I can learn obscure languages just fine, but that doesn't make me any more pleasant to hang out with.

socksy · 2024-08-03T10:22:32 1722680552

I'm not sure that was ever a requirement in these industries

keithalewis · 2024-08-04T04:18:53 1722745133

It is not a requirement. Just a way to weed out people who think they are special snowflakes.

eru · 2024-08-04T23:27:53 1722814073

I'm perfectly capable of learning obscure language _and_ thinking I'm a special snowflake. (In fact, I'm a special snowflake _because_ I am into weird languages.)

Labo333 · 2024-07-20T02:07:03 1721441223

PostgreSQL is Enough https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

Labo333 · 2024-07-16T14:38:00 1721140680

This. An ex girlfriend would get really mad at me because I wanted to make frequent breaks in museums. I wish we had known about Museum Fatigue.

Labo333 · 2024-06-27T09:47:10 1719481630

I'm sorry, I think I made the server crash with the following js code:

    Array.from(document.getElementsByTagName("input")).forEach(e => {e.checked = true})

Labo333 · 2024-06-23T07:06:42 1719126402

@dang maybe the title should reflect the originality of the project, aka the fact it's all AI generated?