Apache Spark Unit Testing Part 1 — Core Components
This article is about how to use own Spark repository classes for Unit Testing and pretend to fill the gap between code and documentation inside Spark Unit Testing domain. Spark has a huge Framework that allow to developers to test their code in any various cases. Most of test classes of package core
are placed here.
Dependencies
Core components
SparkFunSuite
Base abstract class for all unit tests in Spark for handling common functionality. Provides functionality from FunSuite
, ThreadAudit
and Logging
for tests. Thread audit happens normally here automatically when a new test suite created. The only prerequisite for that is that the test class must extend SparkFunSuite
. It is possible to override the default thread audit behavior by setting enableAutoThreadAudit
to false
and manually calling the audit methods, if desired.
SharedSparkContext
Shares a local SparkContext
between all tests in a suite and closes it at the end.
Example:
LocalSparkContext
Manages a local sc
SparkContext
variable, correctly stopping it after each test.
Example:
JsonTestUtils
Class helps to handle json4s
library objects.
Smuggle
Utility wrapper to “smuggle” objects into tasks while bypassing serialization. This is intended for testing purposes, primarily to make locks, semaphores, and other constructs that would not survive serialization available from within tasks. A Smuggle reference is itself serializable, but after being serialized and deserialized, it still refers to the same underlying “smuggled” object, as long as it was deserialized within the same JVM. This can be useful for tests that depend on the timing of task completion to be deterministic, since one can “smuggle” a lock or semaphore into the task, and then the task can block until the test gives the go-ahead to proceed via the lock.
1.2 Benchmark
Benchmark
, BenchmarkBase
Private classes for benchmarking. BenchmarkBase
is a base class for generate benchmark results to a file. For JDK9+, JDK major version number is added to the file names to distingush the results. Benchmark
is a utility class to benchmark components. An example of how to use this is:
This will output the average time to run each function and the rate of each function.
Example:
Example of Benchmark result
1.3 Security
EncryptionFunSuite
Runs a test twice, initializing a SparkConf
object with encryption off, then on. It’s ok for the test to modify the provided SparkConf
.
Example:
1.3 Util
SparkConfWithEnv
Customized SparkConf that allows env variables to be overridden.