Issue
Im trying to insert over 200k rows to sql database, each row represent a card info(70+ string field). Within a Large TXT File. I'm (new Java Developer) facing a quite hard time in this, My approach:
- read File
File file = ReadFile.loadCardFile(pathName);
- convert File to stream
Stream<String> cardsStream = new BufferedReader(new InputStreamReader(new FileInputStream(file), ("UTF-8"))).lines());
- get each line in string array (the card info splitted by '|' and may or maynot that field is spaced)
cardsStream.forEach(s -> {
String[] card = Arrays.stream(s.split("\\|")).map(String::trim).toArray(String[]::new);
- insert each line (card data)
numberOfRows = insertCardService.setCard(card, numberOfRows);
- setCard is to map row data to its columns then I save each card
CardService.save(Card);
with this approach it takes up to 2h which is really really Long time
Is there any advice to better approach or could you provide me with links to read code it better?
oh btw I want to use batch insert to shorten time significantly but I think my way of reading the file is wrong! Thanks in advance!!
Solution
JPA is the wrong tool for this kind of operation. While it is probably possible to make it fast with JPA it is unnecessary difficult to do this. JPA works best in a workflow where you load some entities, edit some attributes and let JPA figure out which exact updates are necessary. For this JPA does a lot of caching which might cost considerable resources.
But here it seems you just want to pump some relevant amount of data into the database. You don't need JPA to figure out what to do, it's all insert. You don't need JPAs cache.
I recommend Springs JdbcTemplate
or NamedParameterJdbcTemplate
.
This probably already speeds up things considerable.
Once that works consider the following:
- Batch inserts, i.e. sending just one statement to the database. See https://mkyong.com/spring/spring-jdbctemplate-batchupdate-example/ Note that some database need special driver argument to properly handle batch updates.
- Doing intermittent commits. In general commits cost performance, because it forces databases to actually write data. But to long transaction might cause trouble as well, especially when the database is doing other stuff as well and in case of errors/rollbacks.
- You need more control over your batches, take a look at Spring Batch.
Answered By - Jens Schauder