About Me


AI System Software Engineer

Passionate Systems Engineer focused on bridging the gap between GPU architecture and large-scale AI workloads. Currently, I specialize in optimizing LLM inference and communication efficiency within AMD GPU clusters. My expertise lies in high-performance computing (HPC) environments, specifically enhancing serving frameworks like vLLM through custom scheduling and advanced communication kernels.

My core focus includes:

  • Distributed Systems: Expert Parallelism, MoE (Mixture of Experts) load balancing
  • Inference Optimization: Implementing efficient KV cache connector & storage designs
  • GPU Programming: Deep dive into GPU memory systems and communication libraries to maximize throughput in multi-node environments

I am driven by the challenge of making large-scale AI models more accessible and efficient through low-level systems engineering.


Technical Skills

Category Tools & Technologies
Hardware (System) Verilog
Software C/C++, Python
Parallel Programming CUDA, HIP, PyTorch
LLM Serving vLLM (expert), SGLang
Profiling perf, Nsight Systems/Compute, rocprof
Container Docker, Kubernetes

Education

M.S. — Computer Hardware Engineering Korea University · 2023.03 – 2025.02

B.S. — Electronic Engineering Chung-ang University · 2017.03 – 2023.02