About Me | HNTS03's Blog

Passionate Systems Engineer focused on bridging the gap between GPU architecture and large-scale AI workloads. Currently, I specialize in optimizing LLM inference and communication efficiency within AMD GPU clusters. My expertise lies in high-performance computing (HPC) environments, specifically enhancing serving frameworks like vLLM through custom scheduling and advanced communication kernels.

My core focus includes:

Distributed Systems: Expert Parallelism, MoE (Mixture of Experts) load balancing
Inference Optimization: Implementing efficient KV cache connector & storage designs
GPU Programming: Deep dive into GPU memory systems and communication libraries to maximize throughput in multi-node environments

I am driven by the challenge of making large-scale AI models more accessible and efficient through low-level systems engineering.

Technical Skills

Category	Tools & Technologies
Hardware	(System) Verilog
Software	C/C++, Python
Parallel Programming	CUDA, HIP, PyTorch
LLM Serving	vLLM (expert), SGLang
Profiling	perf, Nsight Systems/Compute, rocprof
Container	Docker, Kubernetes

Education

M.S. — Computer Hardware Engineering Korea University · 2023.03 – 2025.02

B.S. — Electronic Engineering Chung-ang University · 2017.03 – 2023.02