Minshen Zhang - UC San Diego

About Me

I am an M.S. student in Computer Science at UC San Diego, advised by Prof. Hao Zhang in the Hao AI Lab, and I hold a B.S. from ShanghaiTech University advised by Prof. Kewei Tu. My research lies at the intersection of Machine Learning Systems and generative models, with a focus on efficient, scalable video generation and world models.

At the Hao AI Lab, I am a core contributor to FastVideo, an open-source framework for fast and scalable video generation. I helped build and ship DreamVerse, a real-time video generation workspace that streams 1080p clips on a single GPU — contributing audio–visual continuity to the generation pipeline and accelerating backend inference through kernel benchmarking and fusion. I also train action-conditioned world models on distributed multi-node H200 clusters and contribute training infrastructure, custom GPU kernels, and code reviews across the project.

Before UC San Diego, I was the lead author of FlashMHF, a Multi-Head FFN architecture backed by IO-aware Triton/CUDA kernels that cuts peak memory by 3–5x. At Alibaba Ant Group, I integrated Hierarchical Sparse Attention into the SGLang inference framework and built custom Flash GPU kernels in ThunderKittens/CUDA/Triton.

Experience Research Interests News

Publications

Flash Multi-Head Feed-Forward Network

Minshen Zhang*, Xiang Hu*, Jianguo Li, Wei Wu, Kewei Tu

arXiv Preprint, 2025

We propose Flash Multi-Head FFN (FlashMHF), a novel architecture replacing standard FFNs in Transformers. Backed by IO-aware Triton/CUDA kernels and dynamic sub-networks, FlashMHF reduces peak memory by 3-5x and accelerates inference while improving performance over SwiGLU.

[Paper] [Code] [Project Page] [BibTeX]

Projects

DreamVerse: Real-Time Video Generation

Hao AI Lab (Core Contributor)

Open-Source Release, May 2026

A real-time video generation workspace for "vibe directing" — steering generation through natural-language iteration instead of one-shot prompting. Built on the open-weights LTX-2 model with a FastVideo backend runtime and a Blackwell-optimized pipeline (NVFP4 inference, FA4, torch.compile), streaming 30s 1080p clips with under 5s wait on a single NVIDIA B200. As a core contributor, I built session-based audio–visual continuity for seamless multi-segment generation and accelerated backend inference through kernel benchmarking and fusion.

[OSS Blog] [Demo Blog] [Demo] [Code]

FastVideo

Hao AI Lab (Core Contributor)

Open-Source Project, Oct 2025 - Present

Building scalable and efficient training infrastructure for video generation. Training action-conditioned world models and accelerating inference by SOTA distillation methods. Proposed a novel data curation pipeline for high-quality action-labeled video datasets.

[Open-Source Repo]

Enhancing 3D Character Generation with ControlNet and LoRA

Congrong Xu, Zhanhe Shi, Minshen Zhang, Qingcheng Zhao

EECS 182/282A | Deep Neural Networks, UC Berkeley, 2023

A project exploring enhanced 3D character generation techniques using ControlNet and LoRA for improved control and quality in generative models.

[PDF] [Code] [Project Page] [BibTeX]

CUDA/C++ Parallel Image Rendering

Minshen Zhang

Personal Project, 2023

Built a C++ path tracer supporting Lambertian, metal, dielectric, and emissive materials. Implemented motion blur, depth of field, and volumetric effects. Accelerated rendering via CUDA parallelization and importance sampling, achieving ~200× speedup vs. single-threaded CPU baseline.

[C++ Version] [CUDA Version]

Education

University of California, San Diego

Sep 2025 - Dec 2026 (Expected)

Master of Science in Computer Science and Engineering

La Jolla, CA

University of California, Berkeley

Aug 2023 - Jan 2024

Exchange Student, EECS Department

Berkeley, CA

ShanghaiTech University

Sep 2021 - Jun 2025

Bachelor of Engineering in Computer Science and Technology

Shanghai, China

Honors & Awards

2025 Outstanding Graduate of ShanghaiTech University

2024 Outstanding Student, ShanghaiTech University

2024 Teaching Assistant, CS100 Computer Programming, ShanghaiTech University

2022 Outstanding Student, ShanghaiTech University

About Me

Publications

Flash Multi-Head Feed-Forward Network

Projects

DreamVerse: Real-Time Video Generation

FastVideo

Enhancing 3D Character Generation with ControlNet and LoRA

CUDA/C++ Parallel Image Rendering

Education

University of California, San Diego

University of California, Berkeley

ShanghaiTech University

Honors & Awards

Experience

Graduate Research Assistant

Machine Learning Engineer, Project Leader

Undergraduate Research Intern

Research Interests

News

BibTeX Citation