김태오

async 작업 시 thread 의 traceId 전파 본문

Spring Boot

async 작업 시 thread 의 traceId 전파

ystc1247 2024. 3. 9. 20:07

Spring Boot 배포 환경 시, 서버 로직에서 오류가 발생했을 때, 쉽게 오류를 찾을 수 있는 방법으로 traceId 를 사용하여 로그가 저장되어 있는 곳을 탐색하는 것이다. 예로 AWS CloudWatch 의 로그 그룹에서 "${traceId}" 등으로 검색하는 방법이 있겠다.

 

그렇기에 traceId 가 원활히 로깅되는 것이 중요한데, single thread 동기 작업에서는 크게 어렵지 않다.

 

그러나 비동기적 작업을 원할 시, 로직이 돌며 다른 thread 에서 작업이 이어지는 경우가 있는데, 이런 경우에서 원 쓰레드(부모 쓰레드) 의 traceId 가 작업 쓰레드(자식 쓰레드)로 전파되는 데에 문제가 발생한다. 이를 전파할 수 있어야 작업 쓰레드에서 문제가 발생했을 경우 traceId 를 타고 들어가 효율적인 오류 탐색이 가능하다.

 

Spring Boot 2.x.x 버젼에서는 Spring Sleuth 를 사용하여 traceId propogation 이 원활했지만, 3.x.x 로 넘어오면서 이게 deprecate 되어버렸다. 그래서 micrometer 라는 놈을 써야되는데, metric 을 monitoring tool 로 넘길 때 주로 사용한다. 이걸 쓰면서 발생하는 문제가 이전에 언급한 traceId propogation 이 기본적으로 이루어지지 않는다는 것이다. Spring Security 도 그렇고 3.x 로 기존 프로젝트 로직들을 이관할 때 뭔가 계속 이슈가 발생하는 것 같다.

implementation("io.micrometer", "micrometer-observation")
implementation("io.micrometer", "micrometer-tracing")
implementation("io.micrometer", "micrometer-tracing-bridge-brave")
implementation("io.micrometer", "context-propagation")

 

얘네를 추가해주면 되는데, propogation 관련하여는 https://github.com/micrometer-metrics/context-propagation에 오픈소스로 올라와있다.

@Configuration
class AsynchronousConfiguration: AsyncConfigurer {
    @Bean
    fun threadPoolTaskExecutor(): ThreadPoolTaskExecutor = ThreadPoolTaskExecutor().apply {
        corePoolSize = CORE_POOL_SIZE
        maxPoolSize = MAX_POOL_SIZE
        queueCapacity = QUEUE_CAPACITY
        setThreadNamePrefix("async-")
    }

    @Bean(ASYNC_TASK_EXECUTOR_BEAN_NAME)
    fun asyncTaskExecutor(delegate: ThreadPoolTaskExecutor): DelegatingSecurityContextAsyncTaskExecutor {
        return DelegatingSecurityContextAsyncTaskExecutor(delegate)
    }

    companion object {
        const val ASYNC_TASK_EXECUTOR_BEAN_NAME = "asyncTaskExecutor"
        const val CORE_POOL_SIZE = 10
        const val MAX_POOL_SIZE = 50
        const val QUEUE_CAPACITY = 50
    }
}

 

무난한 Asynchronous Configuration class 를 가정하자. 빈으로 threadPoolTaskExecutor function 을 선언하여 ThreadPookExecutor 의 풀사이즈 등을 설정한다.

 

이후 비동기 작업시 Async value 에 configuration 을 사용할 수 있다.

 

간단한 service method 를 만들어 비동기 쓰레드를 관찰해보자.

 

하나의 메소드 안에서 다른 메소드를 호출하여 부모-자식 관계 느낌의 thread 가 생성되도록 한다.

 

@Async(AsynchronousConfiguration.ASYNC_TASK_EXECUTOR_BEAN_NAME)
    fun startParentTask(traceId: String) {

        val parentThreadName = Thread.currentThread().name
        log.info("Parent task started with traceId: $traceId on thread: $parentThreadName")

        childService.startChildTask(traceId)

        Thread.sleep(1000)

        log.info("Parent task completed with traceId: $traceId on thread: $parentThreadName")
    }

 

우선 부모 메소드이다. 로그에서 thread tracing 을 위해 현재 thread 이름과 함께 임의로 생성된 traceId 를 인자로 넘겨받아 slf4j logging 을 한다.

Thread 에 sleep 을 걸어 로직이 돌아가는 시간을 임의로 강제해본다.

 

@Async(AsynchronousConfiguration.ASYNC_TASK_EXECUTOR_BEAN_NAME)
    fun startChildTask(traceId: String) {
        val childThreadName = Thread.currentThread().name
        val traceId = Thread.currentThread().stackTrace
        log.info("Child task started with traceId: $traceId on thread: $childThreadName")

        Thread.sleep(1000)

        log.info("Child task completed with traceId: $traceId on thread: $childThreadName")
    }

 

자식 메소드이다. 비슷한 느낌으로 구현한다.

 

@GetMapping("/test-trace")
    fun testTraceId(): String {
        val traceId = java.util.UUID.randomUUID().toString()
        log.info("Starting asynchronous job with traceId: $traceId")
        parentService.startParentTask(traceId)
        return "Asynchronous job started with traceId: $traceId"
    }

 

테스트를 위한 간단한 컨트롤러다. 한번에 여러 request 를 날리기 위해 여기에서도 간단히 로깅을 하자. 이전에 인자로 받았던 traceId 는 여기에서 무작위로 생성한다.

logging:
  pattern:
    dateformat: "yyyy-MM-dd HH:mm:ss.SSS"
    level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

traceId 와 spanId 를 편하게 볼 수 있도록 로깅한다. 

 

이제 propogation 을 위한 빈 생성과 decorator 를 단다.

 @Bean
    fun taskDecorator(): TaskDecorator {
        return TaskDecorator { runnable: Runnable? ->
            ContextSnapshot.captureAll(*arrayOfNulls(0)).wrap(
                runnable!!,
            )
        }
    }

 

https://blog.gangnamunni.com/post/mdc-context-task-decorator/ 좀 예전글이긴 한데 비슷한 경우에서 이슈를 해결한 강남언니 기술블로그 글이다.

 

https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/core/task/TaskDecorator.html TaskDecorator 공식문서인데 대충 뭐하는 놈인지 보면 이름 그대로 task 를 꾸며주는 (monitoring, metric 등을 위해) 친구다.

 

빈을 생성했으니 아까 만들어놓았던 threadPoolTaskExecutor 에 이를 추가하자.

@Bean
    fun threadPoolTaskExecutor(): ThreadPoolTaskExecutor = ThreadPoolTaskExecutor().apply {
        corePoolSize = CORE_POOL_SIZE
        maxPoolSize = MAX_POOL_SIZE
        queueCapacity = QUEUE_CAPACITY
        setThreadNamePrefix("async-")
        setTaskDecorator(taskDecorator()) << 이거 추가
    }

 

이제 api 를 여러개 발사하여 로깅이 잘 되는지 확인해보자.

2024-03-09 19:51:03.182  INFO [,65ec3f17e6d38686445bf9e99014b824,445bf9e99014b824] 25486 --- [nio-8080-exec-1] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: 3f0f0f1e-c0b3-4cef-99e3-7cbfd809e164
2024-03-09 19:51:03.183  INFO [,65ec3f17e6d38686445bf9e99014b824,445bf9e99014b824] 25486 --- [nio-8080-exec-1] kr.projects.trace.ParentService          : Parent task started with traceId: 3f0f0f1e-c0b3-4cef-99e3-7cbfd809e164 on thread: http-nio-8080-exec-1
2024-03-09 19:51:03.183  INFO [,65ec3f17e6d38686445bf9e99014b824,445bf9e99014b824] 25486 --- [nio-8080-exec-1] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@6223d89e on thread: http-nio-8080-exec-1
2024-03-09 19:51:04.189  INFO [,65ec3f17e6d38686445bf9e99014b824,445bf9e99014b824] 25486 --- [nio-8080-exec-1] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@6223d89e on thread: http-nio-8080-exec-1
2024-03-09 19:51:05.191  INFO [,65ec3f17e6d38686445bf9e99014b824,445bf9e99014b824] 25486 --- [nio-8080-exec-1] kr.projects.trace.ParentService          : Parent task completed with traceId: 3f0f0f1e-c0b3-4cef-99e3-7cbfd809e164 on thread: http-nio-8080-exec-1
2024-03-09 19:51:44.421  INFO [,65ec3f40987e7c543582282a039a134d,3582282a039a134d] 25486 --- [nio-8080-exec-2] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: dfebee98-f8e6-450e-be36-bd9703a46923
2024-03-09 19:51:44.434  INFO [,65ec3f40987e7c543582282a039a134d,3582282a039a134d] 25486 --- [nio-8080-exec-2] kr.projects.trace.ParentService          : Parent task started with traceId: dfebee98-f8e6-450e-be36-bd9703a46923 on thread: http-nio-8080-exec-2
2024-03-09 19:51:44.436  INFO [,65ec3f40987e7c543582282a039a134d,3582282a039a134d] 25486 --- [nio-8080-exec-2] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@7cf9c5a5 on thread: http-nio-8080-exec-2
2024-03-09 19:51:44.620  INFO [,65ec3f405ae8bf117fd3ec3d607efd2d,7fd3ec3d607efd2d] 25486 --- [nio-8080-exec-3] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: 3e7b5aaf-0ee9-477b-9db9-0a8d98746945
2024-03-09 19:51:44.637  INFO [,65ec3f405ae8bf117fd3ec3d607efd2d,7fd3ec3d607efd2d] 25486 --- [nio-8080-exec-3] kr.projects.trace.ParentService          : Parent task started with traceId: 3e7b5aaf-0ee9-477b-9db9-0a8d98746945 on thread: http-nio-8080-exec-3
2024-03-09 19:51:44.637  INFO [,65ec3f405ae8bf117fd3ec3d607efd2d,7fd3ec3d607efd2d] 25486 --- [nio-8080-exec-3] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@7ebc7ff2 on thread: http-nio-8080-exec-3
2024-03-09 19:51:44.824  INFO [,65ec3f4064b2ba57b991da1e9a58f904,b991da1e9a58f904] 25486 --- [nio-8080-exec-4] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: db6dee41-5500-4f49-a902-d4ed52dfdf1a
2024-03-09 19:51:44.825  INFO [,65ec3f4064b2ba57b991da1e9a58f904,b991da1e9a58f904] 25486 --- [nio-8080-exec-4] kr.projects.trace.ParentService          : Parent task started with traceId: db6dee41-5500-4f49-a902-d4ed52dfdf1a on thread: http-nio-8080-exec-4
2024-03-09 19:51:44.825  INFO [,65ec3f4064b2ba57b991da1e9a58f904,b991da1e9a58f904] 25486 --- [nio-8080-exec-4] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@2c146429 on thread: http-nio-8080-exec-4
2024-03-09 19:51:45.023  INFO [,65ec3f4105c42d6e1380816172851f52,1380816172851f52] 25486 --- [nio-8080-exec-5] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: 08d2f0c0-b588-4a84-a65c-8d6c1a06993f
2024-03-09 19:51:45.024  INFO [,65ec3f4105c42d6e1380816172851f52,1380816172851f52] 25486 --- [nio-8080-exec-5] kr.projects.trace.ParentService          : Parent task started with traceId: 08d2f0c0-b588-4a84-a65c-8d6c1a06993f on thread: http-nio-8080-exec-5
2024-03-09 19:51:45.024  INFO [,65ec3f4105c42d6e1380816172851f52,1380816172851f52] 25486 --- [nio-8080-exec-5] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@27562cf1 on thread: http-nio-8080-exec-5
2024-03-09 19:51:45.220  INFO [,65ec3f41b88d12392f0aed8c54e60a0a,2f0aed8c54e60a0a] 25486 --- [nio-8080-exec-6] kr.projects.trace.TraceController        : Starting asynchronous job with traceId: 26adfe5e-2730-4a31-97eb-49c0eacb9a28
2024-03-09 19:51:45.220  INFO [,65ec3f41b88d12392f0aed8c54e60a0a,2f0aed8c54e60a0a] 25486 --- [nio-8080-exec-6] kr.projects.trace.ParentService          : Parent task started with traceId: 26adfe5e-2730-4a31-97eb-49c0eacb9a28 on thread: http-nio-8080-exec-6
2024-03-09 19:51:45.220  INFO [,65ec3f41b88d12392f0aed8c54e60a0a,2f0aed8c54e60a0a] 25486 --- [nio-8080-exec-6] kr.projects.trace.ChildService           : Child task started with traceId: [Ljava.lang.StackTraceElement;@6b7c68eb on thread: http-nio-8080-exec-6
2024-03-09 19:51:45.438  INFO [,65ec3f40987e7c543582282a039a134d,3582282a039a134d] 25486 --- [nio-8080-exec-2] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@7cf9c5a5 on thread: http-nio-8080-exec-2
2024-03-09 19:51:45.638  INFO [,65ec3f405ae8bf117fd3ec3d607efd2d,7fd3ec3d607efd2d] 25486 --- [nio-8080-exec-3] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@7ebc7ff2 on thread: http-nio-8080-exec-3
2024-03-09 19:51:45.829  INFO [,65ec3f4064b2ba57b991da1e9a58f904,b991da1e9a58f904] 25486 --- [nio-8080-exec-4] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@2c146429 on thread: http-nio-8080-exec-4
2024-03-09 19:51:46.026  INFO [,65ec3f4105c42d6e1380816172851f52,1380816172851f52] 25486 --- [nio-8080-exec-5] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@27562cf1 on thread: http-nio-8080-exec-5
2024-03-09 19:51:46.224  INFO [,65ec3f41b88d12392f0aed8c54e60a0a,2f0aed8c54e60a0a] 25486 --- [nio-8080-exec-6] kr.projects.trace.ChildService           : Child task completed with traceId: [Ljava.lang.StackTraceElement;@6b7c68eb on thread: http-nio-8080-exec-6
2024-03-09 19:51:46.442  INFO [,65ec3f40987e7c543582282a039a134d,3582282a039a134d] 25486 --- [nio-8080-exec-2] kr.projects.trace.ParentService          : Parent task completed with traceId: dfebee98-f8e6-450e-be36-bd9703a46923 on thread: http-nio-8080-exec-2
2024-03-09 19:51:46.639  INFO [,65ec3f405ae8bf117fd3ec3d607efd2d,7fd3ec3d607efd2d] 25486 --- [nio-8080-exec-3] kr.projects.trace.ParentService          : Parent task completed with traceId: 3e7b5aaf-0ee9-477b-9db9-0a8d98746945 on thread: http-nio-8080-exec-3
2024-03-09 19:51:46.830  INFO [,65ec3f4064b2ba57b991da1e9a58f904,b991da1e9a58f904] 25486 --- [nio-8080-exec-4] kr.projects.trace.ParentService          : Parent task completed with traceId: db6dee41-5500-4f49-a902-d4ed52dfdf1a on thread: http-nio-8080-exec-4
2024-03-09 19:51:47.032  INFO [,65ec3f4105c42d6e1380816172851f52,1380816172851f52] 25486 --- [nio-8080-exec-5] kr.projects.trace.ParentService          : Parent task completed with traceId: 08d2f0c0-b588-4a84-a65c-8d6c1a06993f on thread: http-nio-8080-exec-5
2024-03-09 19:51:47.226  INFO [,65ec3f41b88d12392f0aed8c54e60a0a,2f0aed8c54e60a0a] 25486 --- [nio-8080-exec-6] kr.projects.trace.ParentService          : Parent task completed with traceId: 26adfe5e-2730-4a31-97eb-49c0eacb9a28 on thread: http-nio-8080-exec-6

 

매 줄 INFO 뒤 대괄호 속 두번째, 세번째 값이 각각 traceId 와 spanId 인데, 로직이 돌아갈 때 thread 가 변경되었음에도 기존의 traceId 가 잘 전파됨을 확인할 수 있다.