Hacker News new | past | comments | ask | show | jobs | submit login
An SQL Solution for Jupyter (jupyter.org)
139 points by 3l3ktr4 11 months ago | hide | past | favorite | 32 comments



This is amazing. Jupyter is already a great tool for data science and being able to directly interact with SQL (without a host/intermediate language) is great, because it is very clean and SQL is a powerful language on its own – no need for some host language. Furthermore, vega (https://vega.github.io/vega-lite/) is an amazing kind-of-declarative visualization language which mixes great with the declarative language SQL.


thanks for the kind words! :)


The only example of how to use it is an animation that is just impossible to read :(

Here are the examples in the live demo[1]:

    %LOAD sqlite3 db=chinook.db timeout=2 shared_cache=true
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN TRUE MARK square WIDTH 100 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo BIN MAXBINS 3 MARK bar COLOR red WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD EmployeeId Y_FIELD ReportsTo TYPE ordinal MARK bar COLOR green WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
    %XVEGA_PLOT X_FIELD Name TYPE nominal Y_FIELD ArtistId BIN MAXBINS 1 MARK line COLOR purple WIDTH 200 HEIGHT 200 <> SELECT Name, ArtistId FROM artists LIMIT 10
    %XVEGA_PLOT X_FIELD EmployeeId GRID false Y_FIELD ReportsTo MARK area COLOR pink WIDTH 200 HEIGHT 200 <> SELECT EmployeeId, ReportsTo FROM employees
[1] https://hub.gke2.mybinder.org/user/jupyter-xeus-xeus-sql-x64...


hey @andreareina! Actually we have a binder on the repo that you can try + several examples :) link to binder: https://mybinder.org/v2/gh/jupyter-xeus/xeus-sql/stable?urlp... link to examples:https://github.com/jupyter-xeus/xeus-sql/tree/master/example...


Thanks! I'm a fan of notebook-based work and just interactive programming in general and even if I don't use Jupyter myself it's exciting to see innovation in the space.

My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!

Hope that helps. Right now it looks like the sql parts are kind of the equivalent of running a subshell and displaying the results, with no communication to the "main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?


>My complaint is really more that for this sort of thing I feel examples should be front and center; Jupyter "sells" a better experience, so show me how easy and ergonomic it is to use!

Hum, I see. I thought you haven't found the binder, sorry. It's a good point! I'll make sure to pay attention on that on the next posts :)

>"main" code. Is this accurate? Is there a way forward in the future to e.g. pull the data down with sql and then do some further munging in another language?

I'm not sure if I follow what you mean with the main code. you mean having access to each individual value of a query result, for example, to be able to manipulate it? if that's what you mean, that's what we receive in the C++ backend code that runs on this kernel. so as long as you can integrate new libraries to this code you'd be able to manipulate the results of your query.


Would not the latter be the task of a language library? i.e. Pandas with read_sql()?


I think so too. It'd be amazing to have python to receive this output and do stuff with it as people said in other comments.


I know you pronounce it ES-KYU-EL because you used "an"


It's more common than "sequel" in non english speaking places IME.


Yep, it through me off, I saw it somewhere years ago that the pronunciations were:

SQL = SEE-KWUHL MySQL = MY-ES-KYU-EL

And it's now engrained in my mind and this title through me off ;)


"An SQL", "through me off"....

I just woke up and my head is already spinning. ;)


Oh boy, I think I need to go back to bed


"SEE-KWUHL" is actually the pronunciation for MS SQL Server. "ES-KYU-EL" is for all other uses of SQL.


When SQL was established in the early 70s, it was called SEQUEL (Structured English Query Language). However, due to a copyright issue, it was changed it to SQL. In fact, SQL is typically pronounced “sequel” today, but some favor the non-acronym pronunciation of “ess-cue-el” (in case you didn’t know, an acronym is an abbreviation you can pronounce like a word, e.g. SQL or ANSI).

Read more at the ANSI Blog: The SQL Standard – ISO/IEC 9075:2016 (ANSI X3.135) https://blog.ansi.org/?p=158690


sir, it is my-ass-que-el


The Github support for notebooks is so nice (was linked from the example pic caption: https://github.com/wangfenjin/xeus-tidb/blob/develop/example...)


Jupyter would be even better if it supported the seamless combination of Python and SQL code cells.

My notebook code typically involves a data prep stage with querying a SQL database, then downloading into Python for more complex analysis, ML modelling, integration with external data sources, etc. So the notebook has a Python kernel with SQL usually as embedded """-quoted strings.

Does anyone have a solution to treating selected code cells as SQL - with SQL highlighting and tooltips - exposed as string variables to the Python code?

Sparkmagic [1] does part of this for Python/SQL/Spark interoperability, but as far as I recall, doesn't support SQL syntax highlighting.

[1] https://github.com/jupyter-incubator/sparkmagic


Agree this would be awesome. I reckon you could obtain something similar with pandas with pd.read_sql("<query>"), but a dedicated SQL cell akin to markdown cells would make it much more pleasant


There’s a %%bash cell magic which turns one cell into a bash cell. What about a %%xeus cell to run a SELECT (or any sql)? Would that work?



that's pretty amazing work! there have been some efforts in the past to do that: https://github.com/minrk/allthekernels I think it'd be hard to do something like this, but not impossible. Just a lot of work.


I followed the instructions and the sql cells work, but the xvega ones return empty output and I'm not sure how to debug that.

  conda create -n xeus-sql
  conda activate xeus-sql
  conda install xeus-sql soci-mysql soci-postgresql soci-sqlite jupyterlab -c conda-forge
In conda list I see, among others:

  xeus                      1.0.0                h78d96c3_0    conda-forge
  xeus-sql                  0.0.8                h118ccdd_1    conda-forge
  xvega                     0.0.10               h4bd325d_0    conda-forge
  xvega-bindings            0.0.10               h4bd325d_0    conda-forge
Btw there's a typo in the instructions: soci-postresql -> soci-postgresql


Also, for mysql and postgres, are only socket connections supported? No TCP connections?

https://xeus-sql.readthedocs.io/en/latest/PostgreSQL.html#li...


Answering my own question - works as expected, only the documentation is lacking.

%LOAD postgresql host=127.0.0.1 port=5432 dbname=DATABASE user=USERNAME password=PASSWORD


This looks very promising! I will look into it.

When I want a tabular view, I currently either used the pandas read_sql_query method or the PandaSQL module. That does work, and I often do want to use pandas and sql together (often back and forth, depending on the operation).

That said, I can see a use for this tool in my work - I'm often really just interested in interacting with a database in a more visually friendly way than the command line, and it would be great to stay with Jupyter rather than having to go through a completely different UI.

So, yeah, nice looking tool, and thanks for posting!


We have used https://almond.sh/ to create a Spark SQL interpreter using Jupyter Notebooks - plus a whole lot more which you can see here: https://arc.tripl.ai/tutorial

After seeing many companies writing ETL using code we decided it was too hard to manage at scale so provided this abstraction layer - which is heavily centered around expressing business logic in SQL - to standardise development (JupyterLab) and allow rapid deployments.


Does anyone know if can work with 32-bit ODBC drivers? My industry is behind the times and the only way to interact with our ERP is 32-bit ODBC


hey @realityballss, don't really know as I'm not too deep in the SOCI stuff. but we offer full support to SOCI meaning if these folks: https://github.com/SOCI/soci support it and the dependencies exist and work for 32bit, than yes. I'd say it looks a bit like a long shot, but may work.


Those interested in SQL notebooks can also check out https://count.co


Names almost every major sql engine except SQL Server

Says will work with any DB with an ODBC connection

Hmmmm


it’s depending directly of SOCI. I offer a wrapper around everything they do :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: