Issue
I am developing an api to read data from Kafka and write into blob storage in Java using Spark Structured Streaming. I could not find a way to write unit test for that. I have a reader class which returns a dataset and a writer class which talked a dataset as input and write into blob storage in the specified format. I saw some blogs on MemoryStream and but don't think it will suffice my case.
Thanks in advance.
Solution
Apparently, You can refer this answer on how we can use memory streams for unit testing - Unit Test - structured streaming
Also, you can look at this spark-testing-base from Holden Karau. Spark testing base
And you can mock the streaming data frame coming from Kafka and run test cases for the transformation you have in your code on top of that dataframe.
Sample:
static Dataset<Row> createTestStreamingDataFrame() {
MemoryStream<String> testStream= new MemoryStream<String>(100, sqlContext(), Encoders.STRING());
testStream.addData((Arrays.asList("1,1","2,2","3,3")).toSeq());
return testStream.toDF().selectExpr(
"cast(split(value,'[,]')[0] as int) as testCol1",
"cast(split(value,'[,]')[1] as int) as testCol2");
}
Answered By - act_coder