
Show HN: Serverless Apache Spark on Google Colab - leemoonsoo
https://colab.research.google.com/github/open-datastudio/spark-serverless/blob/master/notebooks/Spark_serverless_on_Colab.ipynb
======
hiyer
I guess serverless is a pretty ambiguous term these days, but if this is
spawning executors on k8s, how is it serverless? If you're calling it so
because you are not managing the k8s cluster, that would be true of running
spark on any managed k8s platform like AWS' EKS as well. I was expecting maybe
something like this[1], though this is not production-grade.

1\. [https://github.com/qubole/spark-on-
lambda](https://github.com/qubole/spark-on-lambda)

~~~
leemoonsoo
Yes, if you already have k8s cluster that you don't manage, and you can use
for your Spark, I would like to say you have Spark serverless environment.

IMO, where actual spark executor runs (either on lambda or k8s, or somewhere
else) doesn't much matter to say it is serverless or not, while user can
access the cluster, without managing them.

In our case, we build Spark Serverless service through

    
    
      - Fully managed Kubernetes cluster
      - Isolation between Spark instances (network, storage access, etc)
      - A dedicated Kubernetes node (VM) allocation for a executor, to provide container level security and better performance
      - Secure tunneling between remote Driver and executors for interactive mode (spark client deploy mode)
      - Various optimizations

