Using Scala and Spark in JupyterLab

Timothy Zhang
4 min readMar 28, 2021

Jupyter is a very popular web-based interactive computing platform. It can provide great help for the teaching of programming languages ​​and the prototyping of project development. Just as the origin of its name JuPyteR, it was primarily only for Julia, Python, and R, but now it can support as many as dozens of programming languages ​​and environments.

Jupyter is widely used in Python language learning and project development, especially Python computing and machine learning, etc. As a result, many projects have developed Jupyter to support Scala, which in turn supports the kernel for Spark computing. Currently, the most popular Scala kernel project that is used and actively developed is Almond. Brunk has a more detailed introduction to Almond in this article: “Interactive Computing in Scala with Jupyter and almond” — He believes that Jupyter, which supports Scala, combines advantages of REPLs and Worksheets, and more importantly, its rich-text feature based on web pages.

JupyterLab is the next-generation Jupyter UI after Jupyter Notebook. This article introduces my steps to install, configure and use Scala and Spark on JupyterLab. My environment is macOS, and the following are key components that need to be installed and dependencies among them:

Installed components and their dependencies

Install Python and Scala

Jupyter needs a Python environment, and we need to support Scala, so we firstly should install some supports for the two programming languages ​​on macOS. Because MacPorts is my macOS package management system I install these two languages with it as well. MacPorts can install multiple versions of Python and Scala languages.

  • Refer to the installation website of conda, Python version currently required is 3.9, so use MacPorts to install python39.
  • Refer to Scala version combination supported by almond, there is the latest Scala version 2.13.x. However, the latest version of Spark 3.1.1 currently is pre-built with Scala 2.12, so I installed scala2.12 here
sudo port install python39 python3_select
sudo port install scala2.12 scala_select

If you have installed multiple versions of Python or Scala, you can use port select to set the one required:

sudo port select --list python
sudo port select --set python python39
sudo port select --set python3 python39
sudo port select --list scala
sudo port select --set scala scala2.12

Install conda and JupyterLab

Following the conda installation instructions, I chose to install Miniconda. Download Miniconda3-latest-MacOSX-x86_64.sh, and then run it directly:

bash Miniconda3-latest-MacOSX-x86_64.sh

Installed in my own user directory: ~/miniconda3, and added a piece of initialization code in the file ~/.zshrc:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/Users/timothyzhang/miniconda3/bin/conda' 'shell.zsh' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/Users/timothyzhang/miniconda3/etc/profile.d/conda.sh" ]; then
. "/Users/timothyzhang/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/Users/timothyzhang/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<

Because I used the Oh My Zsh framework for managing Zsh configuration and configured Spaceship ZSH prompt, a C character with blue background will be displayed on my iTerm:

conda environment with Spaceship ZSH prompt

It can be verified that the current conda environment has replaced the previous system’s python and pip:

(base)
~ via 🅒 base
➜ which pip
/Users/timothyzhang/miniconda3/bin/pip
(base)
~ via 🅒 base
➜ which python
/Users/timothyzhang/miniconda3/bin/python

Next install JupyterLab. Refer to its installation documentation. Although you can use conda command to install, you need to use conda-forge repositories. I once encountered compatibility problems when installing some conda-forge packages, so I chose pip command:

pip install jupyterlab

Start JupyterLab directly after installation:

jupyter-lab

You can see its UI then:

JupyterLab’s initial Launcher interface

Install almond’s Scala kernel and configure to use Spark

Refer to Almond’s installation documentation, which requires the coursier dependency resolver. So I installed coursier firstly, referring to its documentation, I chose the “Native launcher” installation method:

curl -fLo cs https://git.io/coursier-cli-"$(uname | tr LD ld)" chmod +x cs ./cs

Then install Scala kernel directly according to almond’s documentation. Here I chose the combined version of almond and Scala 2.12:

./cs launch --fork almond:0.10.9 --scala 2.12 -- --install

Restart JupyterLab, you can see new Scala icons in both Notebook and Console:

Jupyter Launcher UI with almond’s Scala kernel installed

At this point, we can start to learn to use almond’s Scala kernel to try various scala applications. Here is some sample codes:

In the above example, I added the code to suppress the log generation. Moreover, if you use some new Spark packages, you need to import corresponding repositories, such as the mllib package in this example.

Further Studying

Here are some materials for further study:

  • As mentioned earlier, Brunk’s blog post introduces various applications and usage methods of almond;
  • Another almond repository named “examples” on Github contains examples of various Notebooks;
  • almond’s website has pages on how to use it, such as a page on how to use Spark.

--

--