About Me
I am a first-year M.S. student in Computer Science at UC San Diego, advised by Prof. Hao Zhang, and I hold a B.S. from ShanghaiTech University advised by Prof. Kewei Tu. My research lies at the intersection of Natural Language Processing and Machine Learning Systems. I am particularly passionate about designing efficient architectures for Long-Context Modeling and exploring the frontiers of World Models to bridge system efficiency with model capability.
Currently, I focus on scalable training and inference for generative models. I am the lead author of FlashMHF, where I proposed a novel Multi-Head FFN architecture backed by IO-aware Triton/CUDA kernels. Additionally, as a core contributor to FastVideo in Hao AI Lab, I am working on new model aggregation and optimized kernel implementations to accelerate video generation systems.
Looking ahead, I aim to extend my work on FlashMHF to broader LLM backbones and delve deeper into World Models within the FastVideo framework. I am also actively exploring retrieval-based methods and Continual Learning to solve the challenges of long-context understanding in foundation models.
Research Interests
- Natural Language Processing
- World Models
- Long context modeling and MLsys
Publications
Flash Multi-Head Feed-Forward Network
arXiv Preprint, 2025
We propose Flash Multi-Head FFN (FlashMHF), a novel architecture replacing standard FFNs in Transformers. Backed by IO-aware Triton/CUDA kernels and dynamic sub-networks, FlashMHF reduces peak memory by 3-5x and accelerates inference while improving performance over SwiGLU.
News
Education
University of California, San Diego
Sep 2025 - Jan 2027 (Expected)Master of Science in Computer Science and Engineering
La Jolla, CA
University of California, Berkeley
Aug 2023 - Jan 2024Exchange Student, EECS Department
Berkeley, CA
ShanghaiTech University
Sep 2021 - Jun 2025Bachelor of Engineering in Computer Science and Technology
Shanghai, China
Projects
Enhancing 3D Character Generation with ControlNet and LoRA
EECS 182/282A | Deep Neural Networks, UC Berkeley, 2023
A project exploring enhanced 3D character generation techniques using ControlNet and LoRA for improved control and quality in generative models.
CUDA/C++ Parallel Image Rendering
Personal Project, 2023
Built a C++ path tracer supporting Lambertian, metal, dielectric, and emissive materials. Implemented motion blur, depth of field, and volumetric effects. Accelerated rendering via CUDA parallelization and importance sampling, achieving ~200× speedup vs. single-threaded CPU baseline.
NERF Neural Network
Personal Project, 2023
Built a NERF rendering pipeline by understanding Camera Intrinsics & Extrinsics and Volumetric Rendering. Trained and validated neural model on RTX4090 using open-source multi-perspective image datasets.