[Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 #201

sunwon12 · 2026-01-05T10:15:33Z

🔍 Search -> BookSave 흐름 리팩토링 및 성능 최적화

배경 및 문제점 (Problem)

기존 로직은 검색 결과를 DB에 저장하는 과정에서 동시성 이슈, 외부 크롤러의 제한, 그리고 성능 병목이 발생했습니다.

동시성 충돌: 여러 사용자가 동시에 같은 책을 저장하려 할 때 데이터 무결성 예외가 발생.
크롤링 제한: 알라딘 외부 API/웹사이트의 요청 제한으로 인해 429 Too Many Requests 에러 발생.
성능 저하:
- JPA의 save() 호출 시 불필요한 조회 및 영속성 컨텍스트 관리 비용
- 챕터(Chapter) 데이터가 많은 책(70개 이상) 저장 시 N+1 문제 및 느린 쓰기 속도
- Jsoup 사용으로 인한 커넥션 생성 비용
- OSIV 켜져 있어 트랜잭션 수명보다 길게 커넥션을 점유하는 문제

해결책 및 트레이드오프 (Solutions & Trade-offs)

문제	해결 후보군	선택한 해결책	이유 및 트레이드오프
동시성 제어	1. Redis Distributed Lock (ItemId) 2. 트랜잭션 분리 (`REQUIRES_NEW`) 3. Native Query (INSERT IGNORE)	INSERT IGNORE	1. 분산락: `itemId`마다 락을 걸면 충돌 빈도에 비해 성능 저하가 큼. 2. REQUIRES_NEW: DB 커넥션을 2배로 점유하므로 리소스 낭비 심함. 3. Try-Catch: 일반 `save()` 후 예외 발생 시 영속성 컨텍스트가 오염되어 후속 복구(조회)가 불가능함. -> INSERT IGNORE: 위 문제들을 모두 해결하며 가장 효율적.
코드 흐름 가독성		Find-Get 분리	이유: 코드 가독성 향상. 비즈니스 로직의 흐름을 명확하게 표현함.
크롤러 제어	1. 무제한 요청 2. 순차 실행 3. Semaphore	Semaphore (10 permits)	이유: 알라딘 타임리밋/요청 제한 준수. 무제한 요청은 IP 차단 위험, 순차 실행은 성능 저하. Trade-off: 전체 처리량은 제한되지만 안정성 확보.
대량 데이터 저장	1. JPA `saveAll` (Batch Size 설정) 2. JDBC Template Bulk	JDBC Bulk Insert	이유: 책 하나당 챕터가 70개 이상인 경우 JPA의 더티 체킹과 건별 쿼리는 너무 느림. Trade-off: 구현 복잡도 증가(SQL 직접 작성), type-safety 감소하나 성능 압도적.
HTTP 클라이언트	1. Jsoup 2. WebClient	WebClient	이유: Connection Pool 활용 및 불필요한 커넥션 맺고 끊기 비용 최소화
DB 커넥션	1. OSIV On 2. OSIV Off	OSIV Off	이유: 트랜잭션 종료 직후 커넥션 반환하여 리소스 효율 극대화. Trade-off: View Layer에서 지연 로딩 불가, Service 계층에서 DTO 변환이나 초기화 완료 필요.

주요 변경 사항 (Changes)

BookSaveService: createBookWithDetails 메서드에서 INSERT IGNORE 쿼리 실행 후 findByAladinBookId로 조회하는 Find-Get 패턴 적용.
BookRepository: Native Query로 insertIgnoreBook 구현.
AladinCrawlerBatchProcessor: Semaphore(10) 도입하여 동시 크롤링 요청 수 제한.
AladinCrawlerService: Jsoup을 WebClient로 교체, 타임아웃 10초 및 재시도 로직 설정.
ChapterRepository: JDBC Template을 이용한 Bulk Insert/Update/Delete 구현.
Configuration: OSIV (open-in-view) 설정 False로 변경 추정(코드상 반영).

기타 변경사항 (Other Changes)

[Search][Chore] SearchService 코드 정리 (fd62b56)
[Book][Remove] 안 쓰는 코드 삭제 (6c0d32a)
[Book][Refactor] 책이 저장되어있다면 베스트 셀러에 저장 안 함 (a8d7f8d)
[Quiz][Refactor] 리프 엔티티(Quiz, Attempt 등) 삭제 시 JPQL Bulk Delete 적용 (48ff793)
[Test] 리포지토리 테스트 환경 Mysql Mode H2로 변경 (ad8299a)
[Book][Test] 테스트 데이터셋에 카테고리 추가 (b420516)
[Book][Refactor] Book 인덱스 명명 규칙 변경 (b4101b3)
[Book][Pref] 이미 DB에 있는 책들을 한 번에 조회하여 트랜잭션/커넥션 최소화 (a1a10ec)

- 커넥션 재사용을 위함

- try-cath -> insertIgnore로 - 의사결정 주석으로 표

- native 쿼리이므로 book 속성 수정 시 조기에 에러 감지하기 위해서

- 그리고 삭제 repository들 jpql로 진행, 최적화를 위해 - 책, 챕터에서 퀴즈 초기화를 book이 아닌 quizService에서 가지도록 함

- find함수와 save함수 분리

- find함수와 save함수 분리 - 베스트 셀러는 책 저장 임시 큐 엔티티이므

gemini-code-assist · 2026-01-05T10:16:17Z

Summary of Changes

Hello @sunwon12, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이 Pull Request는 검색 및 책 저장 흐름의 전반적인 성능과 안정성을 대폭 개선하는 데 중점을 둡니다. 기존 시스템에서 발생하던 동시성 문제, 외부 API 요청 제한, 그리고 대량 데이터 처리의 비효율성을 해결하기 위해 다양한 최적화 기법을 적용했습니다. 이를 통해 사용자 경험을 향상시키고 시스템 리소스 사용을 효율화하는 것을 목표로 합니다.

Highlights

성능 최적화 및 동시성 제어: 기존 검색 결과 저장 로직의 동시성 이슈, 외부 크롤러 제한, 성능 병목을 해결하기 위해 INSERT IGNORE 쿼리, 세마포어, JDBC Bulk Insert, WebClient 전환 등 다양한 최적화 기법이 적용되었습니다.
책 저장 로직 리팩토링: 책 저장 시 동시성 충돌을 방지하기 위해 JPA의 save() 대신 Native Query의 INSERT IGNORE를 사용하도록 변경되었으며, Find-Get 패턴을 적용하여 코드 가독성을 높였습니다.
크롤링 안정성 강화: 알라딘 API 요청 제한을 준수하고 안정적인 크롤링을 위해 Semaphore를 도입하여 동시 요청 수를 제한하고, Jsoup 대신 WebClient를 사용하여 비동기 논블로킹 I/O 및 커넥션 풀 활용을 최적화했습니다.
대량 데이터 처리 효율 개선: 챕터 데이터와 같은 대량 데이터 저장 시 성능 저하를 해결하기 위해 JPA의 saveAll 대신 JDBC Template을 이용한 Bulk Insert/Update/Delete 기능이 ChapterRepository에 구현되었습니다.
DB 커넥션 관리 개선: OSIV(Open-In-View) 설정을 False로 변경하여 트랜잭션 종료 직후 DB 커넥션을 반환하도록 하여 리소스 효율성을 극대화했습니다.
퀴즈 및 연관 엔티티 삭제 로직 개선: 퀴즈, 퀴즈 시도, 챕터 등 연관된 엔티티 삭제 시 JPQL Bulk Delete 및 배치 삭제 로직이 도입되어 성능이 향상되었고, 퀴즈 상태 초기화 로직이 중앙화되었습니다.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-01-05T10:18:28Z

Test Results

102 files 102 suites 18s ⏱️
513 tests 513 ✅ 0 💤 0 ❌
523 runs 523 ✅ 0 💤 0 ❌

Results for commit a9692fd.

♻️ This comment has been updated with latest results.

github-actions · 2026-01-05T10:18:30Z

🌻 테스트 커버리지 리포트

Overall Project	52.65% `-1.26%`	🍏
Files changed	48.64%	❌

File	Coverage
SearchApi.java	100%	🍏
ChapterRepository.java	100%	🍏
SearchService.java	97.66% `-0.78%`	🍏
BookRepository.java	94.74% `-5.26%`	🍏
BookCategoryService.java	88.24% `-3.27%`	🍏
BookService.java	80.54% `-10.27%`	🍏
Book.java	78.72% `-1.65%`	❌
BookSaveService.java	78.38% `-5.86%`	🍏
ChapterService.java	68.42% `-29.82%`	🍏
BestsellerService.java	53.03% `-41.41%`	❌
ChapterRepositoryImpl.java	41.96% `-58.04%`	❌
BestsellerScheduler.java	24.53% `-15.09%`	❌
AdminBlogCrawlerController.java	19.75%	🍏
AladinCrawlerService.java	15.55% `-7.72%`	❌
AdminAladinCrawlerManualService.java	10.88% `-0.68%`	❌
NaverBlogPostCrawlingStrategy.java	9.64% `-8.85%`	❌

gemini-code-assist

Code Review

검색에서 책 저장으로 이어지는 흐름을 리팩토링하고 성능을 최적화한 훌륭한 PR입니다. PR 설명에 작성해주신 상세한 배경, 문제점 분석, 그리고 해결책 선택 과정이 매우 명확하고 논리적입니다. 동시성 제어를 위한 INSERT IGNORE 도입, 챕터 관리를 위한 JDBC Bulk Insert, 논블로킹 I/O를 위한 WebClient 사용, 그리고 외부 API 호출 제어를 위한 Semaphore 활용 등 적용된 변경 사항들이 모두 적절하며 잘 구현되었습니다. 코드가 전반적으로 더 안정적이고 효율적이며 유지보수하기 좋아졌습니다. 몇 가지 추가 개선을 위한 제안 사항을 남깁니다. 고생 많으셨습니다!

src/main/java/book/book/book/service/BookService.java

gemini-code-assist · 2026-01-05T10:18:52Z

src/main/java/book/book/crawler/service/AladinCrawlerService.java

+//                .retryWhen(Retry.backoff(2, Duration.ofMillis(500))
+//                        .filter(t -> t instanceof TimeoutException || t instanceof java.io.IOException))


fetchTocApiHtml 메서드에서 재시도 로직(retryWhen)이 주석 처리되어 있습니다. fetchBookHtml 메서드에는 재시도 로직이 적용되어 있어 일관성을 맞추고, 외부 API 호출의 안정성을 높이기 위해 이 부분의 주석을 해제하는 것을 고려해 보세요. 특별히 주석 처리한 이유가 있으신가요?

Suggested change

// .retryWhen(Retry.backoff(2, Duration.ofMillis(500))

// .filter(t -> t instanceof TimeoutException || t instanceof java.io.IOException))

.retryWhen(Retry.backoff(2, Duration.ofMillis(500))

.filter(t -> t instanceof TimeoutException || t instanceof java.io.IOException))

AlphaBs · 2026-01-06T12:58:12Z

src/main/java/book/book/book/service/BookCategoryService.java

        }

        BookCategory effectiveParent = parent;
        if (effectiveParent == null) {


effectiveParent 를 찾는 과정이 find 메서드와 create 메서드 두군데에서 중복되는데, 메서드 분리해서 중복코드를 제거할 수 있을거같습니다

AlphaBs · 2026-01-06T13:03:42Z

src/main/java/book/book/book/repository/BookRepository.java

+            "INSERT IGNORE INTO book (alading_book_id, title, author, isbn, isbn13, category_id, description, publisher, published_date, image_url, aladin_url, aladin_star_rating, chapter_count, diary_count, book_size, weight, generated_quiz_count, created_date, updated_date, page) "
+                    +
+                    "VALUES (:#{#book.aladingBookId}, :#{#book.title}, :#{#book.author}, :#{#book.isbn}, :#{#book.isbn13}, :#{#book.category.id}, :#{#book.description}, :#{#book.publisher}, :#{#book.publishedDate}, :#{#book.imageUrl}, :#{#book.aladinUrl}, :#{#book.aladinStarRating}, :#{#book.chapterCount}, 0, :#{#book.bookSize}, :#{#book.weight}, 0, NOW(), NOW(), :#{#book.page})", nativeQuery = true)
+    int insertIgnoreBook(Book book);


saveIfAbsent 이름 어떨까요?

수정하겠습니다!

AlphaBs · 2026-01-06T13:05:28Z

src/main/java/book/book/book/service/BookService.java

-        }
-        return book;
+    public Book saveBook(AladinSearchResponse.SearchItem item) {
+        Optional<AladinBookDetail> detail = crawlBookDetail(item);


DB삽입은 세마포어로 감싸지않은 이유가 궁금합니다

앞단 크롤러에서 세마포어로 속도 조절을 하기 때문에 디비 커넥션 경쟁 상태는 발생하지 않습니다!

sunwon12 added 17 commits January 4, 2026 13:25

[Search][Pref] jsoup -> webclient로 변경

7efc0e8

- 커넥션 재사용을 위함

[Book][Refactor] book 인덱스 명

b4101b3

[Book][Pref] 이미 DB에 있는 책들을 한 번에 조회 및 커넥션 트랜잭션 최소화

a1a10ec

[Book][Refactor] 크롤러 웹클라이언트 10초 타임아웃 설정

f0c6a26

[Book][Refactor] 알라딘 스펙에 맞게 세마포어로 크롤러 동시 요청 제

06cb0a9

[Book][Fix] 책 저장 에러 시 영속성 컨텍스트 더렵혀지고 find 시 에러 해결

6b9227f

- try-cath -> insertIgnore로 - 의사결정 주석으로 표

[Book][Test] 데이터셋에 카테고리 추

b420516

[Test] 리퍼지토리 테스트에서도 Mysql mode h2 쓰도

ad8299a

[Book][Test] insert ignore 테스트 추가

77fe5a6

- native 쿼리이므로 book 속성 수정 시 조기에 에러 감지하기 위해서

[Book][Refactor] 크롤 후 챕터 업데이트 시 jdbc 벌크로 챕터 insert, update, delete

d5f356e

- 그리고 삭제 repository들 jpql로 진행, 최적화를 위해 - 책, 챕터에서 퀴즈 초기화를 book이 아닌 quizService에서 가지도록 함

[Book][Refactor] webclient 재시도 로직 추

0c46736

[Quiz][Refactor] 리프 엔티티들은 삭제 시 jpql로 벌크 삭

48ff793

[Book][Refactor] 검색시 동시성 똑같이 고려하되 조회-저장 패턴 단순화

c06aebc

- find함수와 save함수 분리

[Book][Refactor] 책이 저장되어있다면 베스트 셀러에 저장 안 함

a8d7f8d

- find함수와 save함수 분리 - 베스트 셀러는 책 저장 임시 큐 엔티티이므

[Book][Remove] 안 쓰는 코드 삭

6c0d32a

[Book][Refactor] 네이버 크롤링 책 저장도 조회 - 저장 패턴 분리

2ca548b

[Book][Chore] SearchService 코드 정리

fd62b56

sunwon12 requested a review from AlphaBs January 5, 2026 10:16

sunwon12 changed the title ~~Pref/search~~ [Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 Jan 5, 2026

sunwon12 self-assigned this Jan 5, 2026

sunwon12 added ♻️ refactor Extra attention is needed pref labels Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

AlphaBs reviewed Jan 6, 2026

View reviewed changes

sunwon12 added 2 commits January 8, 2026 00:21

[Book][Refactor] 웹클라이언트 재시도 추가 및

d6de01e

[Book][Refactor] 카테고리 관련 메소드명 변경 메소드 분리

a9692fd

sunwon12 merged commit eb3e797 into dev Jan 7, 2026
2 checks passed

sunwon12 deleted the pref/search branch January 7, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 #201

[Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 #201

Uh oh!

sunwon12 commented Jan 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

AlphaBs Jan 6, 2026

Uh oh!

AlphaBs Jan 6, 2026

Uh oh!

sunwon12 Jan 7, 2026

Uh oh!

AlphaBs Jan 6, 2026

Uh oh!

sunwon12 Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// .retryWhen(Retry.backoff(2, Duration.ofMillis(500))
		// .filter(t -> t instanceof TimeoutException \|\| t instanceof java.io.IOException))

[Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 #201

[Book][Refactor] Search -> BookSave 흐름 가독성 향상 및 크롤러 포함된 동시성 로직 개선 #201

Uh oh!

Conversation

sunwon12 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Search -> BookSave 흐름 리팩토링 및 성능 최적화

배경 및 문제점 (Problem)

해결책 및 트레이드오프 (Solutions & Trade-offs)

주요 변경 사항 (Changes)

기타 변경사항 (Other Changes)

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

github-actions bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🌻 테스트 커버리지 리포트

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

AlphaBs Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

AlphaBs Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

sunwon12 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

AlphaBs Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

sunwon12 Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sunwon12 commented Jan 5, 2026 •

edited

Loading

github-actions bot commented Jan 5, 2026 •

edited

Loading

github-actions bot commented Jan 5, 2026 •

edited

Loading

sunwon12 Jan 7, 2026 •

edited

Loading