A significant trend in the development of IT business in now days is readiness to work with hot data, the lifetime of which from the moment of its appearance can be less than a second.
Let’s say you come to a store and take out a loan to buy a phone. You want to get a loan on favorable terms. And the bank wants to give a loan to a verified client. The time window in which you need credit money is relatively short. An example from Telecom domain. You have run out of money, and at this time your…
Half a year ago, someone guy hijacked a Yandex Drive car that I rented. Hijacked because the application did not close the car. This has happened many times before, but car-sharing users know that Yandex is not in a hurry, that you are not using a car. When the car is parked for a long time during the day, they call. But if you do not close the car at night, be sure that in the morning you will have a pleasant surprise for a tidy sum.
So, local Dominic Toretto not only stole a car, but he also dashed…
Best way ever I think is to start testing of your Spark application with preparing of integration tests. There are a lot of internal and external tools for building this kind of tests. I prefer to use internal testing tools of used frameworks in highest priority with relation to external testing tools. Hadoop developers prepared a wide amount of mini stuff that allow you to start your own cluster with your tests and check integration aspects:
Most often case in context of Apache…
In our company in the production we have a hundreds of scheduling machine learning (ML) models and calculation scripts on a constant basis. And our Data Engineering division had taken a task to make some CD/CD component that was most suitable for our goals:
For achieving our goals we selected Kubernetes as best Docker orchestrating system, Airflow as one of the fast upcoming schedulers and Jenkins as one of the most popular continuous integration continuous delivery…
Third part of article series is about Spark Streaming unit testing. This series pretend to fill the gap between code and documentation inside Spark Unit Testing domain. Spark has a huge Framework that allow to developers to test their code in any various cases.
There is one package
spark streaming. Tests are placed here.
This is the base trait for Spark Streaming test suites. This provides basic functionality to run user-defined set of input on user-defined stream operations, and verify the output. Extends
Additional service classes:
This is a input stream just for the…
Second part of article series about how to use Spark repository classes for Unit Testing. This series pretend to fill the gap between code and documentation inside Spark Unit Testing domain. Spark has a huge Framework that allow to developers to test their code in any various cases.
Spark SQL package has four sub projects each of which has its own test classes:
In context of testing own Spark jobs we will just discuss only three of them (core, catalyst, hive).
This article is about how to use own Spark repository classes for Unit Testing and pretend to fill the gap between code and documentation inside Spark Unit Testing domain. Spark has a huge Framework that allow to developers to test their code in any various cases. Most of test classes of package
core are placed here.
Base abstract class for all unit tests in Spark for handling common functionality. Provides functionality from
Logging for tests. Thread audit happens normally here automatically when a new test suite created. The only prerequisite for that is that the test…