Apache Spark Unit Testing Part 1 — Core Components

Eugene Lopatkin
2 min readNov 20, 2019

This article is about how to use own Spark repository classes for Unit Testing and pretend to fill the gap between code and documentation inside Spark Unit Testing domain. Spark has a huge Framework that allow to developers to test their code in any various cases. Most of test classes of package core are placed here.

Dependencies

Core components

SparkFunSuite

Base abstract class for all unit tests in Spark for handling common functionality. Provides functionality from FunSuite, ThreadAudit and Logging for tests. Thread audit happens normally here automatically when a new test suite created. The only prerequisite for that is that the test class must extend SparkFunSuite. It is possible to override the default thread audit behavior by setting enableAutoThreadAudit to false and manually calling the audit methods, if desired.

SharedSparkContext

Shares a local SparkContext between all tests in a suite and closes it at the end.

Example:

LocalSparkContext

Manages a local sc SparkContext variable, correctly stopping it after each test.

Example:

JsonTestUtils

Class helps to handle json4s library objects.

Smuggle

Utility wrapper to “smuggle” objects into tasks while bypassing serialization. This is intended for testing purposes, primarily to make locks, semaphores, and other constructs that would not survive serialization available from within tasks. A Smuggle reference is itself serializable, but after being serialized and deserialized, it still refers to the same underlying “smuggled” object, as long as it was deserialized within the same JVM. This can be useful for tests that depend on the timing of task completion to be deterministic, since one can “smuggle” a lock or semaphore into the task, and then the task can block until the test gives the go-ahead to proceed via the lock.

1.2 Benchmark

Benchmark, BenchmarkBase

Private classes for benchmarking. BenchmarkBase is a base class for generate benchmark results to a file. For JDK9+, JDK major version number is added to the file names to distingush the results. Benchmark is a utility class to benchmark components. An example of how to use this is:

This will output the average time to run each function and the rate of each function.

Example:

Example of Benchmark result

1.3 Security

EncryptionFunSuite

Runs a test twice, initializing a SparkConf object with encryption off, then on. It’s ok for the test to modify the provided SparkConf.

Example:

1.3 Util

SparkConfWithEnv

Customized SparkConf that allows env variables to be overridden.

--

--

Eugene Lopatkin
Eugene Lopatkin

Written by Eugene Lopatkin

I believe that science makes the World better. Big Data, ML Engineering

Responses (1)