Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I've built a MySQL proxy that supports online DDL (github.com/wesql)
39 points by earayu on Dec 16, 2023 | hide | past | favorite | 14 comments
Hi everyone. Over the past year, I've built a MySQL Proxy that supports Online DDL. I'm not sure if there are other proxies out there with this feature, but I found it fascinating and thought it was worth sharing. Of course, this proxy also supports many other features, such as Read Write Split, Read After Write Consistency, Load Balancing, and Transparent Failover. I didn't build this database from the bottom up; instead, I forked Vitess and put a lot of effort into overhauling its architecture and SQL Engine Layer. Moving forward, I hope to introduce more developer-friendly features into this Proxy. The next major feature should be something similar to the Branch functionality of PlanetScale.



Both PlanetScale and Neon have the branch feature, and it significantly simplify the process of testing and staging. However, the feature in PlanetScale is closed-source, nice to see an open-source alternative of that.


I'm not familiar with the concept of 'online' DDL. Does asynchron means I sacrifice consistency in this example?

   ALTER TABLE ADD COLUMN foo; 
   SELECT * FROM table;
   -- may return column foo, or may not?
DDL is still not safe to use in transactions, right?


Online DDL is typically executed asynchronously. This has the benefit of making DDL more manageable and not impact the availability of the database due to MDL locks or consuming too many resources. After executing an Online DDL, a JobId will be returned. Subsequently, the progress can be checked using the JobId, and some control commands can also be executed, such as suspending, resuming, or canceling.


As for the transactional nature of DDL - MySQL does not support using DDL in transactions (each DDL is a separate transaction).


Can I think of it as open source alike of PlanetScale? And why not work on / with Vitess instead?


Sounds great. Can't wait for the Branch functionality.


More exactly how does a proxy add an online DDL capability to the presumably remote database?

Or is a database proxy something completely different than eg a web proxy?


Interesting! How is the performance comparing to other MySQL compatible proxies anyway?


https://github.com/wesql/wescale/blob/main/doc%2Fblogs%2FPer...

Integrating WeSQL WeScale as a MySQL proxy may initially increase network latency due to additional round trips. However, as the load scales, it matches the throughput and latency of a direct MySQL connection. Connection multiplexing allows WeSQL WeScale to sustain high throughput under heavy loads, unlike MySQL, which may suffer performance drops due to too many connections consuming buffer pool memory. With read-write splitting enabled, WeSQL WeScale continues to scale linearly in throughput beyond MySQL's limits as test threads increase. The memory footprint of WeSQL WeScale is minimal in comparison to MySQL, and it only adds about one-third to the CPU overhead. For increasing queries per second (QPS), WeSQL WeScale's read-write splitting and horizontal scaling is more cost-effective than MySQL's vertical scaling.


Can I think of it as open source alike of PlanetScale?


On a somewhat related note, does anyone know of a web based SQL client app that we can install as a sort of proxy to cloud based DBs?

Company policy puts all kinds of restrictions on desktop connections to DBs but all the data scientists want direct client access so they can use robust software instead of using psql from a jump box.

The objective is mostly logging DDL actions and limiting certain DDL operations.


What prevents SSH port forwarding from the jump box (besides, possibly, policy)? That’s been my go to whenever indirect access is required and tools want to connect directly to hosts.


If I've understood your needs correctly Gravitational Teleport should work for you.


We're going through the same search right now.

I haven't found anything that would suit our ideal use-case (which would be a kind of "collaboratively-editable web Jupyter-notebook server for psql DDL maintenance sessions" — but without the edit-and-replay-in-the-same-cell part, because it should also be its own audit log; and with an "each psql session runs indefinitely and streams output back to the output cell" property that Jupyter doesn't have. Or basically, "something that is to a shared tmux session as Slack is to IRC.") I'm half motivated to just build this myself, since everything about it is a constant need.

Less ideal, but still nice, would be a proxy that creates an SQL statement audit log for all statements executed through it (where you'd then make it so that your devs can only connect through it.) I believe the pgpool proxy can do this — see https://www.pgpool.net/docs/latest/en/html/runtime-config-lo....

> Company policy puts all kinds of restrictions on desktop connections to DBs

Presuming these restrictions exist in your company only to ensure that developers' more casually-guarded security credentials aren't exfiltrated from their machines by a virus or a phishing attack and used to attach to your servers from some external IP: consider putting your developer workstations and your DB servers together onto an enterprise-managed Tailscale overlay-network, and then putting a line in pg_hba.conf such that DB users in the "developers" DB group can only connect from the overlay-network's IP range. (AFAIK it's very hard to exfiltrate a logged-in Tailscale session and reuse it elsewhere.)

However, presuming you care more about this:

> The objective is mostly logging DDL actions and limiting certain DDL operations.

One "hack" approach that will get you just these two properties, but otherwise keep things to what you're already doing, is a pretty old-school one: command-line-specific /etc/sudoers lines. Specifically, you would:

1. Make a DB user called ddl_maintenance; make this DB user, and only this DB user, a pseudo-superuser capable of executing DDL statements. Your developer DB users should not have this privilege.

2. On the jump box, create a ddl_maintenance OS user, with /sbin/nologin as their login shell.

3. Also give the ddl_maintenance user a ~/.pgpass file, containing their DB credentials (so that psql can connect automatically when run as this user). chmod it 400, so that other users won't learn these credentials.

4. Write a policy-enforcer program, that parses a .sql file, and exits 0 if every statement in that file is safe to run per your company policy — or with some other exit code if it's not. (Use https://github.com/pganalyze/libpg_query, or one of its HLL bindings. I'd probably recommend the Golang one for simplest deployment.)

5. Write a glue script that receives a path to a .sql file as $1 (and/or a stream of SQL on stdin); buffers it to a spool dir, so that it can be streamed more than once; tests it for conformance with your policy-checker program from #4; and if it passes, executes it with psql. (Don't pass on arbitrary flags to psql — you want to force your devs to use psql a specific, auditable way.) You want something that looks like this:

   if ddl-looks-good "${sql_in_spool}"; then
     psql -a -f "${sql_in_spool}" | tee -a "/var/log/ddl_maintenance.log"
   fi
6. Add a line to /etc/sudoers, that allows users to sudo as ddl_maintenance without a password, but only to execute this glue script using sudo -c.

7. (optional) wrap the required sudo incantation in a friendly wrapper script, so the devs don't have to learn Weird Sudo Tricks to get their job done. Document this final wrapper script; treat everything else as implementation details.

---

All that being said, if you have a whitelist of DDL commands you want your devs to run... just make some SECURITY DEFINER stored procedures that do those things (and which audit-log into one or more tables or THROW NOTICEs into the PG log), and grant your DB "developers" group EXECUTE privilege on those sprocs. Problem solved :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: