Spring Data JPA: Custom Queries with @Query
- New learners โ start at What is Spring Data JPA? and Why @Query? to understand where custom queries fit.
- Senior engineers โ jump to Persistence Context Internals, N+1 Problem, DTO Projections, or Performance Tuning.
What is Spring Data JPA?โ
Spring Data JPA is a Spring module that eliminates boilerplate data-access code. Instead of writing EntityManager calls manually, you declare a repository interface and Spring generates the implementation at runtime.
Your code Spring Data JPA Database
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
UserRepository โโโ Generated proxy (runtime) โโโ SQL
.findById(1) EntityManager.find() SELECT * FROM users WHERE id = 1
The building blocksโ
| Layer | Responsibility | Example |
|---|---|---|
| Entity | Maps a Java class to a DB table | @Entity class User |
| Repository | Interface declaring data-access operations | UserRepository extends JpaRepository |
| JpaRepository | Spring-provided base with save, findById, delete, pagination, etc. | No code needed |
@Query | Override the generated SQL with your own JPQL or native SQL | When derived methods aren't enough |
What Spring generates without any custom codeโ
// You write this interface โ nothing else
public interface UserRepository extends JpaRepository<User, Long> { }
// Spring generates at startup โ all of these work for free:
userRepository.findById(1L); // SELECT * FROM users WHERE id = 1
userRepository.save(user); // INSERT or UPDATE
userRepository.deleteById(1L); // DELETE FROM users WHERE id = 1
userRepository.findAll(pageable); // SELECT * FROM users LIMIT ? OFFSET ?
userRepository.count(); // SELECT COUNT(*) FROM users
Why @Query?โ
Derived query methods โ the convenient starting pointโ
Spring Data JPA can generate queries by parsing method names. No SQL or JPQL required:
// Spring reads the method name and generates the SQL automatically
List<User> findByStatusAndDepartmentId(String status, Long departmentId);
// โ SELECT * FROM users WHERE status = ? AND department_id = ?
List<User> findByFirstNameContainingIgnoreCase(String keyword);
// โ SELECT * FROM users WHERE LOWER(first_name) LIKE LOWER('%?%')
List<User> findTop5ByStatusOrderByCreatedAtDesc(String status);
// โ SELECT * FROM users WHERE status = ? ORDER BY created_at DESC LIMIT 5
When derived methods break downโ
As queries grow complex, method names become unreadable and unmaintainable:
// โ Derived method โ technically works, practically unreadable
List<User> findByStatusAndDepartmentIdAndCreatedAtAfterAndRoleIn(
String status, Long departmentId, LocalDate date, List<String> roles);
// โ
@Query โ clear, explicit, maintainable
@Query("""
SELECT u FROM User u
WHERE u.status = :status
AND u.department.id = :deptId
AND u.createdAt > :since
AND u.role IN :roles
""")
List<User> findFilteredUsers(
@Param("status") String status,
@Param("deptId") Long deptId,
@Param("since") LocalDate since,
@Param("roles") List<String> roles);
When to use @Queryโ
| Situation | Use @Query? | Reason |
|---|---|---|
| Simple equality / range filter on one or two fields | โ | Derived method is cleaner |
| Three or more filter conditions | โ | Derived method name becomes unreadable |
| JOIN across related entities | โ | Derived methods can't express JOINs well |
Aggregation (SUM, AVG, COUNT, GROUP BY) | โ | Not expressible as derived method |
Database-specific function (PostgreSQL JSONB, window functions) | โ | Requires native SQL |
| Bulk UPDATE or DELETE | โ | Requires @Modifying + @Query |
| Return a DTO/projection instead of a full entity | โ | Constructor expression or interface projection |
JPQL vs Native SQLโ
Before writing queries, understand the difference between the two query modes:
JPQL (Java Persistence Query Language)โ
JPQL operates on entity classes and their fields, not on database tables and columns. Hibernate translates JPQL into SQL at runtime.
// JPQL โ references the Java entity name "User" and field "firstName"
@Query("SELECT u FROM User u WHERE u.firstName = :name")
// Hibernate translates to SQL โ references the actual table "users" and column "first_name"
// โ SELECT u.id, u.first_name, u.last_name, ... FROM users u WHERE u.first_name = ?
Advantages of JPQL:
- Database-agnostic โ works on PostgreSQL, MySQL, Oracle, H2 without changes.
- Hibernate can apply dialect-specific optimisations during AST translation.
- Works with Hibernate's L1/L2 caches and lazy loading.
- Refactoring entity field names (with an IDE) updates all JPQL references.
Native SQLโ
Native SQL is sent directly to the database without JPQL parsing. Use it when you need database-specific features unavailable in JPQL.
// Native SQL โ references the actual table and column names
@Query(
value = "SELECT * FROM users WHERE properties->>'timezone' = :tz",
nativeQuery = true
)
List<User> findByTimezone(@Param("tz") String timezone);
When to use native SQL:
- Database-specific functions: PostgreSQL
JSONB,ARRAY,LATERAL,WITH(CTEs). - Window functions (
ROW_NUMBER(),RANK(),LAG()). - Full-text search with
tsvector/MATCH AGAINST. - Complex
UNION,INTERSECT, or recursive CTEs.
Native queries bypass Hibernate's dialect translation. Migrating from MySQL to PostgreSQL requires rewriting every nativeQuery = true query. Always add a comment documenting the database dependency, and centralise native queries in one repository to make a future migration manageable.
Side-by-side comparisonโ
| JPQL | Native SQL | |
|---|---|---|
| Operates on | Entity classes & fields | DB tables & columns |
| Database portability | โ Portable | โ DB-specific |
| DB-specific functions | โ Not supported | โ Fully supported |
| Hibernate cache integration | โ Full | โ ๏ธ Limited |
| Performance optimisation | Hibernate AST | Raw driver |
| Refactoring safety | โ IDE-assisted | โ Manual |
| When to use | Default choice | DB-specific features only |
Parameter Bindingโ
Injecting variables into queries safely is fundamental. Spring Data JPA prevents SQL injection by using prepared statement placeholders โ parameters are never concatenated into the query string.
Positional parametersโ
Parameters are referenced by their position in the method signature โ ?1 for the first argument, ?2 for the second, and so on.
@Query("SELECT u FROM User u WHERE u.firstName = ?1 AND u.lastName = ?2")
Optional<User> findByFullName(String firstName, String lastName);
If you reorder method parameters during a refactor, positional references silently break โ ?1 now maps to the wrong argument. Prefer named parameters.
Named parameters (recommended)โ
Use @Param("name") to bind parameters explicitly by name. The query reference is :name. The order of method arguments no longer matters.
@Query("""
SELECT u FROM User u
WHERE u.firstName = :firstName
AND u.lastName = :lastName
AND u.status = :status
""")
Optional<User> findByFullNameAndStatus(
@Param("firstName") String firstName,
@Param("lastName") String lastName,
@Param("status") String status
);
Passing collectionsโ
JPQL's IN clause accepts a Collection parameter directly โ no manual string-building needed:
@Query("SELECT u FROM User u WHERE u.role IN :roles AND u.status = :status")
List<User> findByRolesAndStatus(
@Param("roles") List<String> roles,
@Param("status") String status
);
// Usage
repository.findByRolesAndStatus(List.of("ADMIN", "MANAGER"), "ACTIVE");
// โ SELECT ... FROM users WHERE role IN ('ADMIN', 'MANAGER') AND status = 'ACTIVE'
LIKE patternsโ
JPQL does not auto-wrap :param in wildcards. You must either provide them from the caller or use CONCAT:
// Option 1: wrap at call site
@Query("SELECT u FROM User u WHERE u.email LIKE :pattern")
List<User> findByEmailPattern(@Param("pattern") String pattern);
// Caller: repository.findByEmailPattern("%@gmail.com")
// Option 2: wrap inside the query with CONCAT
@Query("SELECT u FROM User u WHERE u.email LIKE CONCAT('%', :domain, '%')")
List<User> findByEmailDomain(@Param("domain") String domain);
// Caller: repository.findByEmailDomain("gmail.com")
Sorting and Paginationโ
Dynamic sortingโ
Pass a Sort object as a parameter โ Spring Data appends an ORDER BY clause to your JPQL at runtime without you writing it:
@Query("SELECT u FROM User u WHERE u.status = :status")
List<User> findByStatus(@Param("status") String status, Sort sort);
// Usage โ build Sort programmatically
Sort sort = Sort.by(Sort.Direction.DESC, "createdAt")
.and(Sort.by(Sort.Direction.ASC, "lastName"));
repository.findByStatus("ACTIVE", sort);
// โ SELECT ... FROM users WHERE status = 'ACTIVE'
// ORDER BY created_at DESC, last_name ASC
Pagination with Pageableโ
Pageable adds LIMIT + OFFSET and returns a Page<T> that also contains total count metadata (used for building pagination UI):
@Query("SELECT u FROM User u WHERE u.role = :role")
Page<User> findByRolePaginated(@Param("role") String role, Pageable pageable);
// Usage
Pageable pageable = PageRequest.of(
0, // page number (0-indexed)
20, // page size
Sort.by("createdAt").descending() // optional sort
);
Page<User> page = repository.findByRolePaginated("ADMIN", pageable);
page.getContent(); // List<User> โ the 20 records
page.getTotalElements(); // total matching records (e.g. 847)
page.getTotalPages(); // 847 / 20 = 43 pages
page.hasNext(); // true
Page<T> automatically runs a SELECT COUNT(*) alongside your data query. If your main query is expensive (many JOINs), the count query may also be slow. You can provide a separate, optimised count query:
@Query(
value = "SELECT u FROM User u JOIN FETCH u.roles WHERE u.status = :status",
countQuery = "SELECT COUNT(u) FROM User u WHERE u.status = :status"
)
Page<User> findActiveUsersWithRoles(@Param("status") String status, Pageable pageable);
Data Modification โ @Modifyingโ
@Query alone only reads data. To execute UPDATE or DELETE statements, pair it with @Modifying:
// โ
UPDATE โ returns number of affected rows
@Modifying
@Transactional
@Query("UPDATE User u SET u.status = 'INACTIVE' WHERE u.lastLoginDate < :threshold")
int deactivateDormantAccounts(@Param("threshold") LocalDate threshold);
// โ
DELETE โ returns number of deleted rows
@Modifying
@Transactional
@Query("DELETE FROM User u WHERE u.status = 'DELETED' AND u.updatedAt < :before")
int purgeDeletedUsers(@Param("before") LocalDate before);
@Modifying without @Transactional throws InvalidDataAccessApiUsageException at runtime. Either annotate the repository method or ensure the calling service method is transactional.
Bulk operation rulesโ
| Rule | Why |
|---|---|
Always use @Modifying for UPDATE/DELETE | Without it, Spring assumes read-only โ exception at runtime |
Add @Transactional | DML requires an active transaction |
Use clearAutomatically = true | Clears the L1 cache after the query โ prevents stale entity reads (see senior section) |
Use flushAutomatically = true | Flushes pending entity changes before executing โ prevents lost updates |
SpEL Expressionsโ
Spring Expression Language (SpEL) can be embedded in @Query strings inside #{ ... }. The most useful built-in SpEL variable is #entityName โ it resolves to the entity name declared on the @Entity annotation, enabling reusable base queries in generic repository code.
// Without SpEL โ hardcoded entity name breaks if the entity is renamed
@Query("SELECT u FROM User u WHERE u.email LIKE %:domain%")
// With SpEL โ resolves the entity name dynamically
@Query("SELECT e FROM #{#entityName} e WHERE e.email LIKE %:domain%")
List<T> findByEmailDomain(@Param("domain") String domain);
Most common SpEL use cases:
// In a generic base repository shared by many entities
@NoRepositoryBean
public interface BaseRepository<T, ID> extends JpaRepository<T, ID> {
// #{#entityName} โ resolves to "User", "Product", "Order", etc. at runtime
@Query("SELECT e FROM #{#entityName} e WHERE e.createdBy = :userId")
List<T> findAllCreatedBy(@Param("userId") Long userId);
}
// Works for all entities that extend BaseRepository
public interface UserRepository extends BaseRepository<User, Long> { }
public interface ProductRepository extends BaseRepository<Product, Long> { }
Projections โ Return Only What You Needโ
By default, Spring Data returns fully hydrated entity objects. This is wasteful when you only need 2โ3 fields from a 30-column table โ Hibernate must read, allocate, and dirty-track the entire entity graph.
Interface projections (Spring Data feature)โ
Declare a Java interface with getter methods matching the fields you want. Spring generates a proxy that reads only those columns.
// Projection interface โ only id, firstName, email
public interface UserSummary {
Long getId();
String getFirstName();
String getEmail();
}
// Return the projection type directly โ no @Query needed for simple cases
public interface UserRepository extends JpaRepository<User, Long> {
List<UserSummary> findByStatus(String status);
// Spring generates: SELECT u.id, u.first_name, u.email FROM users WHERE status = ?
}
Class-based DTO projections with @Queryโ
For complex queries (aggregations, computed fields, multiple entities), use JPQL constructor expressions to project results directly into a DTO class:
// UserSummaryDto.java โ a plain class, not an entity
@Getter
@AllArgsConstructor
public class UserSummaryDto {
private final Long id;
private final String firstName;
private final String email;
private final Long orderCount; // computed field โ not on the entity
}
// Repository
@Query("""
SELECT new dev.example.dto.UserSummaryDto(
u.id, u.firstName, u.email, COUNT(o.id)
)
FROM User u
LEFT JOIN u.orders o
WHERE u.status = 'ACTIVE'
GROUP BY u.id, u.firstName, u.email
""")
List<UserSummaryDto> getActiveUserSummaries();
| Full entity | Interface projection | DTO projection | |
|---|---|---|---|
| Columns fetched | All | Only declared getters | Only constructor args |
| Hibernate tracking | โ Dirty-checked | โ No tracking | โ No tracking |
| Proxy overhead | Heavy | Light proxy | None |
| Memory usage | High | Low | Lowest |
| Use for | Write operations | Simple reads | Complex reads / aggregations |
Complete Use-Case Examplesโ
- Search & filter
- Aggregation / reporting
- Native โ window functions
- Bulk DML
@Query("""
SELECT u FROM User u
WHERE (:status IS NULL OR u.status = :status)
AND (:role IS NULL OR u.role = :role)
AND (:search IS NULL OR LOWER(u.firstName) LIKE LOWER(CONCAT('%', :search, '%'))
OR LOWER(u.email) LIKE LOWER(CONCAT('%', :search, '%')))
""")
Page<User> searchUsers(
@Param("status") String status,
@Param("role") String role,
@Param("search") String search,
Pageable pageable
);
// Caller passes null for fields that should not filter
// repository.searchUsers("ACTIVE", null, "alice", PageRequest.of(0, 20))
public interface DepartmentStat {
String getDepartmentName();
Long getUserCount();
Double getAverageSalary();
}
@Query("""
SELECT d.name AS departmentName,
COUNT(u.id) AS userCount,
AVG(u.salary) AS averageSalary
FROM Department d
LEFT JOIN d.users u
GROUP BY d.id, d.name
ORDER BY COUNT(u.id) DESC
""")
List<DepartmentStat> getDepartmentStats();
@Query(
value = """
SELECT
u.id,
u.first_name,
u.department_id,
SUM(o.total) OVER (PARTITION BY u.department_id) AS dept_total,
RANK() OVER (PARTITION BY u.department_id ORDER BY SUM(o.total) DESC) AS dept_rank
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.created_at >= :since
GROUP BY u.id, u.first_name, u.department_id
""",
nativeQuery = true
)
List<Object[]> getUserSalesRankByDepartment(@Param("since") LocalDate since);
// Bulk status update โ affects many rows in one SQL call
@Modifying(clearAutomatically = true, flushAutomatically = true)
@Transactional
@Query("UPDATE User u SET u.status = :newStatus WHERE u.id IN :ids")
int bulkUpdateStatus(@Param("newStatus") String newStatus, @Param("ids") List<Long> ids);
// Soft delete with audit trail
@Modifying(clearAutomatically = true)
@Transactional
@Query("""
UPDATE User u
SET u.status = 'DELETED',
u.deletedAt = CURRENT_TIMESTAMP,
u.deletedBy = :actorId
WHERE u.id = :userId AND u.status != 'DELETED'
""")
int softDeleteUser(@Param("userId") Long userId, @Param("actorId") Long actorId);
Persistence Context Internalsโ
The sections below focus on JVM memory mechanics, Hibernate internals, and production performance patterns. Understanding these separates developers who write working queries from engineers who write efficient ones.
The L1 cache (First-Level Cache)โ
Every JPA transaction has a persistence context โ Hibernate's L1 cache. It is a Map<EntityKey, Object> that lives for the duration of the transaction. Every entity you load is placed in this map.
Transaction starts
โ
โโ em.find(User.class, 1L) โ MISS โ SELECT FROM db โ stores in L1 cache
โโ em.find(User.class, 1L) โ HIT โ returns from cache, no SQL
โโ em.find(User.class, 2L) โ MISS โ SELECT FROM db โ stores in L1 cache
โ
โโ user.setName("Alice Updated") โ entity is "dirty" in the cache
โ
Transaction commits โ Hibernate detects dirty entity โ flushes UPDATE to DB
โ
Cache discarded
Key implications:
- Two calls to
findById(1L)in the same transaction hit the DB only once โ safe to call freely. - Hibernate tracks every managed entity for changes (dirty checking) at flush time. Loading 10,000 entities means Hibernate compares 10,000 objects against their snapshots at commit โ expensive.
- The L1 cache is not shared between transactions or threads.
The stale-cache problem with @Modifyingโ
A bulk UPDATE query executes as a single SQL statement directly against the database, completely bypassing the L1 cache. If you had already loaded an entity in the same transaction, your Java object is now out of sync with the database:
@Transactional
public void example() {
// Step 1: loads User id=1 into L1 cache โ status = "ACTIVE"
User user = userRepository.findById(1L).orElseThrow();
System.out.println(user.getStatus()); // "ACTIVE"
// Step 2: bulk UPDATE bypasses the cache entirely
userRepository.deactivateDormantAccounts(LocalDate.now().minusMonths(6));
// DB row for id=1 is now status = "INACTIVE"
// Step 3: reads from L1 cache โ returns stale data!
System.out.println(user.getStatus()); // โ still "ACTIVE" โ stale!
// Step 4: even re-fetching by ID returns from cache
User reloaded = userRepository.findById(1L).orElseThrow();
System.out.println(reloaded.getStatus()); // โ still "ACTIVE" โ cache hit!
}
The fix โ clearAutomatically = true:
@Modifying(
clearAutomatically = true, // evicts all entities from L1 cache after execution
flushAutomatically = true // flushes pending dirty entities to DB before execution
)
@Query("UPDATE User u SET u.status = 'INACTIVE' WHERE u.lastLoginDate < :threshold")
int deactivateDormantAccounts(@Param("threshold") LocalDate threshold);
With clearAutomatically = true:
Step 1: findById(1) โ loads into L1 cache
Step 2: @Modifying UPDATE โ executes SQL โ clears L1 cache
Step 3: findById(1) โ MISS (cache cleared) โ fresh SELECT from DB โ
clearAutomatically clears the entire L1 cacheAll currently tracked entities are evicted โ not just the ones affected by your query. Any entity loaded before this call that you planned to use afterward must be re-fetched. Design your transaction flow to run bulk modifications before loading entities you need afterward.
The N+1 Problemโ
N+1 is the most common JPA performance mistake. It occurs when loading a list of N entities, then accessing a lazy-loaded relationship on each โ triggering N additional queries:
@Entity
public class User {
@OneToMany(fetch = FetchType.LAZY) // lazy โ not loaded with the user
private List<Order> orders;
}
// The N+1 scenario
@Transactional
public void printUserOrders() {
List<User> users = userRepository.findAll(); // 1 SELECT: loads 100 users
for (User u : users) {
// Accessing the lazy collection triggers a query per user:
System.out.println(u.getOrders().size());
// 100 SELECTs: SELECT * FROM orders WHERE user_id = ?
}
// Total: 1 + 100 = 101 queries โ
}
In a high-throughput environment, 101 queries for one endpoint depletes HikariCP connection pool threads, causing downstream timeouts for all other requests.
Solving N+1 with JOIN FETCHโ
JOIN FETCH tells Hibernate to load the association in the same query as the parent:
// Single query fetches users AND their orders together
@Query("SELECT u FROM User u JOIN FETCH u.orders WHERE u.status = :status")
List<User> findActiveUsersWithOrders(@Param("status") String status);
// โ SELECT u.*, o.* FROM users u INNER JOIN orders o ON o.user_id = u.id WHERE u.status = ?
// Total: 1 query โ
JOIN FETCH vs LEFT JOIN FETCHโ
// INNER JOIN FETCH โ only returns users who HAVE at least one order
@Query("SELECT u FROM User u JOIN FETCH u.orders WHERE u.status = :status")
// LEFT JOIN FETCH โ returns all users, with empty list if they have no orders
@Query("SELECT u FROM User u LEFT JOIN FETCH u.orders WHERE u.status = :status")
You cannot safely combine JOIN FETCH with Pageable:
// โ This produces a Hibernate warning and loads ALL rows into memory first:
@Query("SELECT u FROM User u JOIN FETCH u.orders")
Page<User> findAllWithOrders(Pageable pageable);
// WARNING: HHH90003004: firstResult/maxResults specified with collection fetch;
// applying in memory!
// โ All rows loaded into JVM, then sliced โ defeats the purpose of pagination
Solutions:
Option 1 โ Use @EntityGraph instead (works with pagination):
@EntityGraph(attributePaths = {"orders"})
Page<User> findByStatus(String status, Pageable pageable);
Option 2 โ Separate queries (batch fetch pattern):
// Step 1: paginate user IDs only
@Query("SELECT u.id FROM User u WHERE u.status = :status")
Page<Long> findUserIdsByStatus(@Param("status") String status, Pageable pageable);
// Step 2: fetch users + orders for just those IDs
@Query("SELECT u FROM User u JOIN FETCH u.orders WHERE u.id IN :ids")
List<User> findUsersWithOrdersByIds(@Param("ids") List<Long> ids);
@EntityGraph vs JOIN FETCHโ
Both solve N+1, but in different ways:
- @EntityGraph
- JOIN FETCH
// Defined on the entity:
@Entity
@NamedEntityGraph(
name = "User.withRolesAndDepartment",
attributeNodes = {
@NamedAttributeNode("roles"),
@NamedAttributeNode("department")
}
)
public class User { ... }
// Used in the repository:
@EntityGraph("User.withRolesAndDepartment")
Optional<User> findByUsername(String username);
// Or inline without @NamedEntityGraph:
@EntityGraph(attributePaths = {"roles", "department"})
List<User> findByStatus(String status);
Pros: works with derived methods and pagination. Spring generates a LEFT OUTER JOIN.
Cons: always LEFT JOIN โ cannot specify INNER JOIN. Less control over the generated SQL.
@Query("SELECT u FROM User u JOIN FETCH u.roles JOIN FETCH u.department WHERE u.status = :status")
List<User> findActiveUsersWithRolesAndDepartment(@Param("status") String status);
Pros: full control โ choose INNER or LEFT JOIN per relationship. Can add WHERE conditions on joined tables. Explicit and readable.
Cons: cannot be combined with Pageable (loads all into memory). More verbose.
@EntityGraph | JOIN FETCH | |
|---|---|---|
| Works with Pageable | โ Yes | โ No (in-memory warning) |
| Works with derived methods | โ Yes | โ No |
| JOIN type control | โ Always LEFT OUTER | โ INNER or LEFT |
| Condition on joined table | โ No | โ Yes |
| Multiple collections | โ ๏ธ Cartesian product risk | โ ๏ธ Cartesian product risk |
Fetching two @OneToMany collections in one query produces a Cartesian product โ if a user has 5 orders and 3 roles, the result set has 15 rows for that user. For JOIN FETCH, Hibernate deduplicates with DISTINCT. For large collections this is still wasteful:
// โ Cartesian product โ avoid for large collections
@Query("SELECT DISTINCT u FROM User u JOIN FETCH u.orders JOIN FETCH u.roles")
// โ
Better: use @BatchSize on the entity collections (separate queries, but batched)
@OneToMany
@BatchSize(size = 50) // loads roles for 50 users in one IN query
private List<Role> roles;
DTO Projections for Memory Optimizationโ
Fetching full entities forces Hibernate to:
- Read every column from the result set.
- Allocate a Java object for each entity.
- Store a deep snapshot copy for dirty checking at flush time.
- Maintain all of this in the L1 cache for the transaction duration.
For read-heavy endpoints, this is wasteful. DTO projections skip all of it:
// โ Fetches all 30 columns, allocates entity proxies, enables dirty tracking
List<User> users = userRepository.findByStatus("ACTIVE");
// โ
Fetches 4 columns, no proxy, no dirty checking, no L1 cache tracking
@Query("""
SELECT new dev.example.dto.UserSummaryDto(
u.id, u.firstName, u.email, COUNT(o.id)
)
FROM User u
LEFT JOIN u.orders o
WHERE u.status = 'ACTIVE'
GROUP BY u.id, u.firstName, u.email
""")
List<UserSummaryDto> getActiveUserSummaries();
Memory impact at scale:
Full entity fetch โ 10,000 users:
Per entity: ~2KB (entity object + Hibernate snapshot + proxy metadata)
Total heap: ~20 MB held for the duration of the transaction
DTO projection โ 10,000 users (4 fields):
Per DTO: ~80 bytes (plain object, no tracking)
Total heap: ~800 KB โ 25ร less memory pressure
Performance Tuningโ
๐ฌ Senior deep-dive: query hints
@QueryHints passes hints to the underlying JPA provider to control caching, lock modes, and query timeouts:
@QueryHints({
@QueryHint(name = "org.hibernate.readOnly", value = "true"), // no dirty checking
@QueryHint(name = "org.hibernate.fetchSize", value = "50"), // JDBC fetch size
@QueryHint(name = "jakarta.persistence.query.timeout", value = "5000") // 5s timeout
})
@Query("SELECT u FROM User u WHERE u.status = 'ACTIVE'")
List<User> findActiveUsersReadOnly();
| Hint | Effect |
|---|---|
org.hibernate.readOnly = true | Disables dirty checking โ reduces memory, speeds up flush |
org.hibernate.fetchSize = N | JDBC streaming batch size โ reduces memory for large result sets |
jakarta.persistence.query.timeout | Cancels slow queries after N milliseconds |
org.hibernate.cacheable = true | Enables the L2 (query result) cache for this query |
๐ฌ Senior deep-dive: HikariCP and connection pool starvation
For detailed guidelines on pool sizing, parameter details, starvation patterns, and Micrometer instrumentation, see the centralized Database Connection Pooling guide.
๐ฌ Senior deep-dive: streaming large result sets
Loading 100,000 rows into a List blows up heap memory. Stream them instead:
// Returns a lazy Java Stream โ rows are fetched from JDBC in batches, not all at once
@QueryHints(@QueryHint(name = "org.hibernate.fetchSize", value = "100"))
@Query("SELECT u FROM User u WHERE u.status = 'ACTIVE'")
Stream<User> streamActiveUsers();
// Usage โ must be inside a transaction; close the stream when done
@Transactional(readOnly = true)
public void exportUsers(OutputStream out) {
try (Stream<User> stream = userRepository.streamActiveUsers()) {
stream.map(userMapper::toDto)
.forEach(dto -> csvWriter.write(out, dto));
}
// Stream closed โ cursor closed โ connection returned to pool
}
A Stream<T> from Spring Data holds an open JDBC cursor and therefore an active database connection. Not closing it leaks the connection. Always use try-with-resources or call .close() explicitly.
๐ฌ Senior deep-dive: @Query vs Specification vs QueryDSL
For dynamic queries where filters change per request, @Query with optional parameter tricks (IS NULL OR field = :param) is limited. Compare the alternatives:
@Query | Specification (JPA Criteria) | QueryDSL | |
|---|---|---|---|
| Static queries | โ Best | Overkill | Overkill |
| Dynamic filters | โ ๏ธ Hacky (IS NULL tricks) | โ Designed for this | โ Designed for this |
| Type safety | โ String-based | โ Type-safe at compile time | โ Type-safe |
| Readability | โ SQL-like, familiar | โ Verbose Criteria API | โ Fluent, readable |
| Pagination support | โ Native | โ Native | โ Native |
| Setup cost | None | None (built-in) | Requires APT plugin |
// Dynamic search with Specification โ composable predicates
public class UserSpecifications {
public static Specification<User> hasStatus(String status) {
return (root, query, cb) ->
status == null ? null : cb.equal(root.get("status"), status);
}
public static Specification<User> nameContains(String keyword) {
return (root, query, cb) ->
keyword == null ? null : cb.like(
cb.lower(root.get("firstName")),
"%" + keyword.toLowerCase() + "%"
);
}
}
// Compose at call site โ only active conditions are applied
Specification<User> spec = where(hasStatus("ACTIVE")).and(nameContains("alice"));
Page<User> results = userRepository.findAll(spec, PageRequest.of(0, 20));
Decision guide:
- Simple, fixed query โ
@Query. - Dynamic filters that change per request โ
Specification. - Complex dynamic queries with joins across many tables โ
QueryDSL.
Testing @Query Methodsโ
Test repository queries with @DataJpaTest โ it loads only the JPA slice (no controllers, no services) and uses an in-memory H2 database by default:
@DataJpaTest
class UserRepositoryTest {
@Autowired
private UserRepository userRepository;
@Autowired
private TestEntityManager em;
@BeforeEach
void setUp() {
em.flush();
}
@Test
void findByStatus_returnsOnlyActiveUsers() {
List<User> result = userRepository.findByStatus("ACTIVE", Sort.unsorted());
assertThat(result).hasSize(2)
.extracting(User::getUsername)
.containsExactlyInAnyOrder("alice", "bob");
}
@Test
void getActiveUserSummaries_includesOrderCount() {
// persist an order for alice
em.persist(new Order("alice-id", BigDecimal.TEN));
em.flush();
List<UserSummaryDto> summaries = userRepository.getActiveUserSummaries();
assertThat(summaries).anyMatch(s ->
s.getFirstName().equals("alice") && s.getOrderCount() == 1L);
}
@Test
void deactivateDormantAccounts_updatesStatusAndReturnsCount() {
int affected = userRepository.deactivateDormantAccounts(
LocalDate.now().minusMonths(6));
assertThat(affected).isGreaterThanOrEqualTo(0);
em.clear(); // clear L1 cache before asserting DB state
User alice = em.find(User.class, /* alice's id */);
assertThat(alice.getStatus()).isEqualTo("INACTIVE");
}
}
H2 doesn't support PostgreSQL-specific syntax (JSONB, LATERAL, ON CONFLICT). Test native queries against a real database using Testcontainers:
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Testcontainers
class NativeQueryTest {
@Container
static PostgreSQLContainer<?> postgres =
new PostgreSQLContainer<>("postgres:16-alpine");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
// ... test methods
}
Common Mistakesโ
| Mistake | Why it's a problem | Fix |
|---|---|---|
Using @Modifying without @Transactional | Throws InvalidDataAccessApiUsageException at runtime | Add @Transactional to the repository method |
@Modifying without clearAutomatically = true | Stale entities in L1 cache cause incorrect reads after bulk update | Set clearAutomatically = true |
JOIN FETCH with Pageable | Hibernate loads all rows into memory, then slices in JVM โ OutOfMemoryError risk | Use @EntityGraph or two-query pattern |
| Fetching full entities for read-only endpoints | Unnecessary heap pressure from dirty-checking snapshots | Use DTO projections or @QueryHint readOnly = true |
| N+1 on every request | 100 users ร 1 lazy query = 101 DB round trips, pool starvation | JOIN FETCH or @EntityGraph |
SELECT * patterns in native queries | Couples result mapping to table schema โ breaks on column additions | Select only the columns you need |
Not closing Stream<T> result | Open JDBC cursor leaks the database connection โ pool exhaustion over time | Always use try-with-resources |
| Hardcoding entity name in SpEL-eligible base queries | Breaks when entity is renamed | Use #{#entityName} in generic repositories |
๐ฏ Interview Questionsโ
Q1. What is the difference between JPQL and native SQL in Spring Data JPA? When do you choose each?
JPQL operates on JPA entity classes and their fields โ Hibernate translates it to SQL using the configured dialect. It is database-portable, integrates with the L1/L2 caches, and is refactoring-safe. Native SQL is sent directly to the database unchanged โ use it only for database-specific features unavailable in JPQL:
JSONBoperations, window functions, recursive CTEs, full-text search. Default to JPQL; switch to native only when you hit its limits.
Q2. What is the N+1 problem and how does JOIN FETCH solve it?
N+1 occurs when loading N parent entities and then accessing a lazy-loaded collection on each โ triggering one query per parent. 100 users with lazy
orders= 1 + 100 = 101 queries.JOIN FETCHloads the parent and the association in a single SQL JOIN query, reducing 101 queries to 1. The trade-off:JOIN FETCHcannot be combined withPageablewithout loading all rows into memory first.
Q3. Why must @Modifying queries use clearAutomatically = true?
Bulk
UPDATEandDELETEqueries execute directly against the database, bypassing the JPA persistence context (L1 cache). Any entities already loaded in the same transaction hold stale data. WithoutclearAutomatically = true, subsequent reads in the same transaction return cached (stale) values even after afindById. With it, the L1 cache is cleared after the bulk operation, forcing fresh DB reads.
Q4. When would you use a DTO projection instead of returning a full entity?
For read-only, high-frequency endpoints where you need only a subset of fields. Full entities incur: reading all columns, allocating the entity object, creating a dirty-checking snapshot, and storing everything in the L1 cache. DTO projections skip all of this โ Hibernate reads only the projected columns, allocates a plain object, and does no tracking. For a list of 10,000 records, this can reduce heap usage by 10โ25ร.
Q5. (Senior) How does HikariCP connection pool starvation relate to N+1 queries?
Each query borrows a connection from HikariCP for the duration of the call. N+1 queries inside a single transaction hold a connection open while issuing serial queries, quickly depleting the pool and causing other requests to time out. For details, sizing, and solutions, see Database Connection Pooling.
Q6. (Senior) How do @EntityGraph and JOIN FETCH differ, and when do you choose each?
Both solve N+1 by eagerly loading associations in a JOIN. Differences:
@EntityGraphalways uses LEFT OUTER JOIN, works withPageableand derived methods, but offers no control over JOIN type or conditions on the joined table.JOIN FETCHin@Queryallows INNER or LEFT JOIN, supports WHERE conditions on the joined table, and is explicit โ but cannot be safely combined withPageable. Rule of thumb: use@EntityGraphfor paginated lists or when derived methods are sufficient; useJOIN FETCHin@Querywhen you need precise SQL control and are not paginating.
Q7. (Senior) What happens when you call @Modifying without flushAutomatically = true, and there are pending entity changes in the same transaction?
By default (
flushAutomatically = false), the persistence context has not yet flushed dirty entity changes to the database when the bulk query runs. If you loaded a user, setuser.setStatus("ACTIVE"), and then ran a bulk UPDATE affecting that same row โ the bulk UPDATE may overwrite the in-memory change with the old value from DB, or the subsequent flush may overwrite the bulk result. WithflushAutomatically = true, Hibernate first flushes all pending changes to the DB, then executes the bulk query โ ensuring a consistent ordering. The safe pattern: always set bothflushAutomatically = trueandclearAutomatically = trueon@Modifyingqueries.