Issue
We use a SQL Server Database.
When attempting to run a query on a table with 100M records, using 27GB of memory, Hibernate will translate the query, whether it's a findById
or a findByFieldName
into something like this:
select sbtinvento0_.cust_nbr as cust_nbr1_10_0_, sbtinvento0_.ean_upc as ean_upc2_10_0_, sbtinvento0_.rep_bal_qty as rep_bal_3_10_0_ from odh_inv.sbt_inven sbtinvento0_ where sbtinvento0_.cust_nbr=? and sbtinvento0_.ean_upc=?
What happens now is that this query lasts about 7 seconds. Both fields in the database are of type string.
We were able to reproduce this behavior in DBeaver:
SELECT CUST_NBR, EAN_UPC, REP_BAL_QTY
FROM ODH_INV.SBT_INVEN
WHERE CUST_NBR = 0000100974 AND EAN_UPC = '0042823091698';
This query will also last 7 seconds. But, if the query is done in the following way, it only takes 0.7 seconds:
SELECT CUST_NBR, EAN_UPC, REP_BAL_QTY
FROM CommonSQLDevPrj.ODH_INV.SBT_INVEN
WHERE CUST_NBR = '0000100974' AND EAN_UPC = '0042823091698';
This behavior is repeated across multiple applications/environments/databases, on tables with millions of lines, on queries automatically created by Hibernate when using findById
or findByFieldName
with String parameters.
I tried googling this issue but I couldn't manage to find anything helpful. Is there any property that can be set in the application.yml file to force apostrophes to be used on String parameters?
I tried the following method:
@Query("SELECT si FROM SbtInventory si WHERE sbtInventoryId.customerNumber = ':customerNumber' and sbtInventoryId.eanUpc = ':eanUpc'")
Optional<SbtInventory> findByCustomerNumberAndEanUpc(String customerNumber, String eanUpc);
But the query runs as is, and it doesn't substitute the query parameters with the values from the function parameters. I tried using ?1 and ?2, or escaping the apostrophe, none of them worked.
I also tried some workarounds using Entity Manager and Session.
First workaround, using Entity Manager, does work, in under 1 second. The issue is that I have to add the parameters manually in the query. In this case it's not an issue, as this is a job and we have a guarantee that there won't be any SQL Injection here, but this method is not recommended for this exact reason, and we won't be able to use it on APIs.
private final EntityManager em;
public Optional<SbtInventory> get(String customerNumber, String eanUpc) {
@SuppressWarnings("unchecked")
List<SbtInventory> sbtInventory = em
.createQuery("SELECT si FROM SbtInventory si WHERE sbtInventoryId.customerNumber = '" + customerNumber
+ "' and sbtInventoryId.eanUpc = '" + eanUpc + "'")
.getResultList();
if (sbtInventory.size() == 0) {
return Optional.empty();
} else {
return Optional.of(sbtInventory.get(0));
}
}
The second method that is recommended instead, using Session, also works, but the query takes again 7 seconds to run:
private final EntityManager em;
@Transactional
public Optional<SbtInventory> get(String customerNumber, String eanUpc) {
Session session = null;
if (em == null || (session = em.unwrap(Session.class)) == null) {
throw new NullPointerException();
}
@SuppressWarnings("unchecked")
List<SbtInventory> sbtInventory = session.createQuery(
"SELECT si FROM SbtInventory si WHERE sbtInventoryId.customerNumber = :customerNumber and sbtInventoryId.eanUpc = :eanUpc")
.setParameter("customerNumber", customerNumber, StandardBasicTypes.STRING)
.setParameter("eanUpc", eanUpc, StandardBasicTypes.STRING).getResultList();
if (sbtInventory.size() == 0) {
return Optional.empty();
} else {
return Optional.of(sbtInventory.get(0));
}
}
But all of this is undesired, as it adds a lot of extra code and extra handling on our queries, so I'm still hoping there's some hibernate property we can set so it adds apostrophes by itself.
EDIT: I tried Jens suggestion with cast:
SELECT CUST_NBR, EAN_UPC, REP_BAL_QTY
FROM ODH_INV.SBT_INVEN
WHERE CUST_NBR = CAST(0000100974, varchar) AND EAN_UPC = '0042823091698';
This query runs fast, but brings back no result. I thought it might be because of removing leading zeroes, so I tried:
SELECT CUST_NBR, EAN_UPC, REP_BAL_QTY
FROM ODH_INV.SBT_INVEN
WHERE CUST_NBR = RIGHT(CONCAT('0000000000', CAST(0000100974 AS varchar)), 10) AND EAN_UPC = '0042823091698';
This is finally working, but it would imply to manually write the query, which gets me to @Query in a Repository instead of using entity manager, which is an upgrade ( can't test it till Monday if it's actually working from Hibernate ), but still implies writing every sensitive query like this, instead of simply using queries formed by function name :(
EDIT 2: I'm going to reply to some of gadget's questions here
Any customization in the hibernate mapping of customerNumber in your SbtInventory entity? Could you please post the mapping?
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@ToString
@Entity
@Builder
@Table(name = "SBT_INVEN", schema = "ODH_INV")
public class SbtInventory {
@EmbeddedId
private SbtInventoryIdentity sbtInventoryId;
@Column(name = "REP_BAL_QTY")
private Integer reportedBalanceQuantity;
}
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@ToString
@EqualsAndHashCode(callSuper = false)
@Embeddable
public class SbtInventoryIdentity implements Serializable {
/**
*
*/
private static final long serialVersionUID = -892835625356278964L;
@Column(name = "CUST_NBR")
private String customerNumber;
@Column(name = "EAN_UPC")
private String eanUpc;
}
Also what dialect are you using for hibernate?
Full config properties for the database connection:
datasource:
url: jdbc:sqlserver://db-url:db-port;databaseName=db-name
username: user
password: pass
driver-class-name: com.microsoft.sqlserver.jdbc.SQLServerDriver
hikari:
data-source-properties:
cachePrepStmts: true
prepStmtCacheSize: 250
prepStmtCacheSqlLimit: 2048
useServerPrepStmts: false
maximum-pool-size: 50
jpa:
database-platform: org.hibernate.dialect.SQLServer2008Dialect
database: SQL_SERVER
show-sql: true
hibernate:
ddl-auto: none
properties:
hibernate.dialect: org.hibernate.dialect.SQLServer2008Dialect
hibernate.jdbc.batch_size: 20
hibernate.cache.use_second_level_cache: false
hibernate.enable_lazy_load_no_trans: true
Any particular reason for using char instead of varchar in the db?
That's a question I don't have the exact answer to. This is how the database was designed. Customer number is a foreign key to another table, where the primary key was created as Char. These are some old tables and in order to keep that standard, they keep going with char for legacy fields.
EDIT 3:
@Query(value = "SELECT REP_BAL_QTY\r\n" + "FROM ODH_INV.SBT_INVEN\r\n"
+ "WHERE CUST_NBR = RIGHT(CONCAT('0000000000', CAST(:customerNumber AS varchar)), 10) "
+ "AND EAN_UPC = RIGHT(CONCAT('0000000000000', CAST(:eanUpc AS varchar)), 13)", nativeQuery = true)
Optional<Integer> findReportedBalanceQuantityByCustomerNumberAndEanUpc(String customerNumber, String eanUpc);
So this is working fine inside a repository, but in this case it's an API I'm exposing, and I only need the REP_BAL_QTY column in the result, so I have to run this query for each row of the page I'm returning. Before I had this mapped in the entity:
@MapsId
@OneToOne
@JoinColumns({
@JoinColumn(name = "EAN_UPC", referencedColumnName = "EAN_UPC"),
@JoinColumn(name = "CUST_NBR", referencedColumnName = "CUST_NBR")
})
@NotFound(action = NotFoundAction.IGNORE)
private SbtInventory sbtInventory;
This was called when the entity was loaded and the process is a bit faster, I think ( besides the obvious, avoiding the boilerplate code )
Using the join in the entity, the process to bring a page of 10 rows lasts:
SbtInventoryController.getData took 84113 ms
This is one of the executed queries for the join:
select sbtinvento0_.CUST_NBR as CUST_NBR1_19_0_, sbtinvento0_.EAN_UPC as EAN_UPC2_19_0_, sbtinvento0_.REP_BAL_QTY as REP_BAL_3_19_0_ from ODH_INV.SBT_INVEN sbtinvento0_ where sbtinvento0_.CUST_NBR=? and sbtinvento0_.EAN_UPC=?
Using the query, the process lasts:
SbtInventoryController.getData took 14063 ms
So it's a huge performance jump. The downside is that by using the query, I have to drop sorting on the column holding REP_BAL_QTY :(
Solution
One month later, we finally managed to identify what causes the issue.
After talking with our DBA, we were provided with the evaluation plans for our queries, where we noticed that some of them had a CONVERT_IMPLICIT function call in the predicate:
CONVERT_IMPLICIT(nchar(18),[dbo].[PRICING].[MATL_NBR] as [pricing0_].[MATL_NBR],0)=[@P1]
On this query, the CONVERT_IMPLICIT call was causing it to run in 150ms and to not take leverage of the INDEX applied to the table.
Searching for this CONVERT_IMPLICIT function, I found this article which was mentioning what I deducted, that the query will use more resources and the proper index won't be selected:
https://blog.sqlauthority.com/2018/06/11/sql-server-how-to-fix-convert_implicit-warnings/
Searching on how to disable this behavior, I found the following article from a few months ago written by Vlad Mihalcea, to which I would like to give the credits for the solution:
https://vladmihalcea.com/sql-server-jdbc-sendstringparametersasunicode/
The article is specifying that a default property is set on the JDBC driver, sendStringParametersAsUnicode, which should always be disabled, as it greatly impacts the performance of queries. The article is also mentioning that Microsoft suggests the same thing:
When working with CHAR, VARCHAR, and LONGVARCHAR data, users can set the connection property sendStringParametersAsUnicode to false for optimal performance gain.
So, the solution is quite simple. Simply append sendStringParametersAsUnicode=false; to your database connection string. You should always set this property to false to prevent performance issues on queries.
dbc:sqlserver://localhost;instance=SQLEXPRESS;databaseName=high_performance_sql;sendStringParametersAsUnicode=false;
Answered By - Marius Mihai Vintila