About Me
I am a first-year M.S. student in Computer Science at UC San Diego, advised by Prof. Hao Zhang, and I hold a B.S. from ShanghaiTech University advised by Prof. Kewei Tu. My research lies at the intersection of Natural Language Processing and Machine Learning Systems. I am particularly passionate about designing efficient architectures for Long-Context Modeling and exploring the frontiers of World Models to bridge system efficiency with model capability.
Currently, I focus on scalable training and inference for generative models. I am the lead author of FlashMHF, where I proposed a novel Multi-Head FFN architecture backed by IO-aware Triton/CUDA kernels. As a core contributor to FastVideo in Hao AI Lab, I am training action-conditioned video generation models on distributed multi-node clusters and contributed to Dreamverse, achieving real-time 1080p video generation. Previously at Alibaba Ant Group, I integrated Hierarchical Sparse Attention into the SGLang inference framework and built custom Flash GPU kernels in ThunderKittens/CUDA/Triton.
Looking ahead, I aim to extend my work on FlashMHF to broader LLM backbones and delve deeper into World Models within the FastVideo framework. I am also actively exploring retrieval-based methods and Continual Learning to solve the challenges of long-context understanding in foundation models.
Publications
Flash Multi-Head Feed-Forward Network
arXiv Preprint, 2025
We propose Flash Multi-Head FFN (FlashMHF), a novel architecture replacing standard FFNs in Transformers. Backed by IO-aware Triton/CUDA kernels and dynamic sub-networks, FlashMHF reduces peak memory by 3-5x and accelerates inference while improving performance over SwiGLU.
Projects
Dreamverse: Realtime Video Generation
Project, Mar 2026
Achieving 30 seconds of 1080p clip generation with 4.55 seconds of wait time on a single GPU. Contributed heavily to generation consistency by modifying the video model pipeline, and accelerated backend inference by benchmarking and fusing kernels.
FastVideo
Open-Source Project, Oct 2025 - Present
Building scalable and efficient training infrastructure for video generation. Training action-conditioned world models and accelerating inference by SOTA distillation methods. Proposed a novel data curation pipeline for high-quality action-labeled video datasets.
Enhancing 3D Character Generation with ControlNet and LoRA
EECS 182/282A | Deep Neural Networks, UC Berkeley, 2023
A project exploring enhanced 3D character generation techniques using ControlNet and LoRA for improved control and quality in generative models.
CUDA/C++ Parallel Image Rendering
Personal Project, 2023
Built a C++ path tracer supporting Lambertian, metal, dielectric, and emissive materials. Implemented motion blur, depth of field, and volumetric effects. Accelerated rendering via CUDA parallelization and importance sampling, achieving ~200× speedup vs. single-threaded CPU baseline.
NERF Neural Network
Personal Project, 2023
Built a NERF rendering pipeline by understanding Camera Intrinsics & Extrinsics and Volumetric Rendering. Trained and validated neural model on RTX4090 using open-source multi-perspective image datasets.
Education
University of California, San Diego
Sep 2025 - Dec 2026 (Expected)Master of Science in Computer Science and Engineering
La Jolla, CA
University of California, Berkeley
Aug 2023 - Jan 2024Exchange Student, EECS Department
Berkeley, CA
ShanghaiTech University
Sep 2021 - Jun 2025Bachelor of Engineering in Computer Science and Technology
Shanghai, China