Issue
I had spring batch application like below (table name and query are edited for some general names)
when i execute this program, it was able to read 7500 events , i.e 3 times of chunk size and not able to read remaining records in oracle database. I had a table contain 50 million records and able to copy to another noSql database.
@EnableBatchProcessing
@SpringBootApplication
@EnableAutoConfiguration
public class MultiThreadPagingApp extends DefaultBatchConfigurer{
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Autowired
public DataSource dataSource;
@Bean
public DataSource dataSource() {
final DriverManagerDataSource dataSource = new DriverManagerDataSource();
dataSource.setDriverClassName("oracle.jdbc.OracleDriver");
dataSource.setUrl("jdbc:oracle:thin:@***********");
dataSource.setUsername("user");
dataSource.setPassword("password");
return dataSource;
}
@Override
public void setDataSource(DataSource dataSource) {}
@Bean
@StepScope
ItemReader<UserModel> dbReader() throws Exception {
JdbcPagingItemReader<UserModel> reader = new JdbcPagingItemReader<UserModel>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select * ");
sqlPagingQueryProviderFactoryBean.setFromClause("from user");
sqlPagingQueryProviderFactoryBean.setWhereClause("where id>0");
sqlPagingQueryProviderFactoryBean.setSortKey("name");
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
reader.setDataSource(dataSource);
reader.setPageSize(2500);
reader.setRowMapper(new BeanPropertyRowMapper<>(UserModel.class));
reader.afterPropertiesSet();
reader.setSaveState(true);
System.out.println("Reading users anonymized in chunks of {}"+ 2500);
return reader;
}
@Bean
public Dbwriter writer() {
return new Dbwriter(); // I had another class for this
}
@Bean
public Step step1() throws Exception {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(10);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<UserModel, UserModel>chunk(2500)
.reader(dbReader())
.writer(writer())
.taskExecutor(taskExecutor)
.build();
}
@Bean
public Job multithreadedJob() throws Exception {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public PlatformTransactionManager getTransactionManager() {
return new ResourcelessTransactionManager();
}
@Bean
public JobRepository getJobRepo() throws Exception {
return new MapJobRepositoryFactoryBean(getTransactionManager()).getObject();
}
public static void main(String[] args) {
SpringApplication.run(MultiThreadPagingApp.class, args);
}
}
Can you help me how can i efficiently read all the records using spring batch, or help me any other approach to handle this. I had tried one approch mentioned here : http://techdive.in/java/jdbc-handling-huge-resultset its taken 120 mins to read and save all records with single thread application. Since spring batch is best fit for this, I assume we can handle this scenario in quick time.
Solution
You are setting the saveState
flag to true (BTW, it should be set before calling afterPropertiesSet
) on a JdbcPagingItemReader
and using this reader in a multithreaded step. However, it is documented to set this flag to false in a multi-threaded context.
Multi-threading with database readers is usually not the best option, I would recommend to use partitioning in your case.
Answered By - Mahmoud Ben Hassine
Answer Checked By - Clifford M. (JavaFixing Volunteer)