Passionate Systems Engineer focused on bridging the gap between GPU architecture and large-scale AI workloads. Currently, I specialize in optimizing LLM inference and communication efficiency within AMD GPU clusters. My expertise lies in high-performance computing (HPC) environments, specifically enhancing serving frameworks like vLLM through custom scheduling and advanced communication kernels.
My core focus includes:
- Distributed Systems: Expert Parallelism, MoE (Mixture of Experts) load balancing
- Inference Optimization: Implementing efficient KV cache connector & storage designs
- GPU Programming: Deep dive into GPU memory systems and communication libraries to maximize throughput in multi-node environments
I am driven by the challenge of making large-scale AI models more accessible and efficient through low-level systems engineering.
Technical Skills
| Category | Tools & Technologies |
|---|---|
| Hardware | (System) Verilog |
| Software | C/C++, Python |
| Parallel Programming | CUDA, HIP, PyTorch |
| LLM Serving | vLLM (expert), SGLang |
| Profiling | perf, Nsight Systems/Compute, rocprof |
| Container | Docker, Kubernetes |
Education
M.S. — Computer Hardware Engineering Korea University · 2023.03 – 2025.02
B.S. — Electronic Engineering Chung-ang University · 2017.03 – 2023.02