Minshen Zhang

Minshen Zhang

Computer Science Graduate, NLP Researcher, Developer

Shanghai / San Diego

miz055@ucsd.edu

(+86)19946167252

GitHub

About Me

Hi there! My name is Minshen Zhang. You can call me Alex.

I recently graduated with a Bachelor's degree in Computer Science and Technology from ShanghaiTech University in June 2025. I will be pursuing a Master's degree in Computer Science at UC San Diego starting Fall 2025.

My research interests primarily focus on Natural Language Processing and Machine Learning. Currently, I am deeply researching transformers mechanisms and exploring how to design faster and better MLP modules. I am also investigating how to use Sparse Autoencoder (SAE) techniques to mitigate hallucination problems in large language models during RAG processes.

I'm open for academic and professional collaborations. If you are interested, please feel free to email me.

Education

2025.09 - (expected)

UC San Diego

Master in Computer Science (CS75)

2021.09 - 2025.06

ShanghaiTech University

Bachelor of Computer Science and Technology

Overall GPA: 3.69/4.0; Major GPA: 3.8+/4.0

English Tests: TOEFL(100), GRE(320), CET6(602), CET4(607), Duolingo(135)

2023.08 - 2024.01

UC Berkeley Extension

Berkeley GLOBE Student Program

Overall GPA: 4.0/4.0

Research Experience

2025.07 - Present

Shanghai Alibaba Ant Group NLP Research Lab

Research Intern of NLP Research Group

Mentor: Xiang Hu

Observed that the multi-subspace learning capability of Multi-Query Attention significantly enhances the expressiveness of the attention mechanism. Given the high similarity in their mathematical formulations, we hypothesize that the MLP component in Transformer blocks can also be decomposed into multi-head computations in a similar manner.

Designed an integration method for multi-head MLPs within standard Transformer blocks and a corresponding triton algorithm that improves throughput (1.06x) and reduces memory usage (over 2x) compared with LlamaMLP (SwiGLU) baselines; project lead.

2025.02 - Present

ShanghaiTech Kewei Tu's Lab

Research Assistant of NLP Research Lab

Mentor: Prof. Kewei Tu (ShanghaiTech University)

Building on Have I Learned This Entity? (ICLR 2025), using sparse autoencoders to test whether an LLM encodes entity-specific knowledge in context and to identify sparse latent directions. Applying these directions to downstream tasks such as feedback-guided RAG retrieval, RAG-based hallucination detection and latent direction steering; project lead

2024.06 - 2025.01

Shanghai Qizhi Institute

Research Intern of Machine Learning Research Group

Mentor: Prof. Tao Du (Tsinghua University)

Mathematically discovered an analytical solution for real-time inverse-refraction problems based on knowledge of physical optics, linear algebra and computer numerical methods. Developed real-time machine learning algorithms for water surface reconstruction to assist robot control tasks. Based on Snell's law, derived the relationship between water surface normal vectors and visual information, proving the feasibility of machine learning approaches. Implemented core algorithms using Nvidia-Warp-Cuda, achieving breakthrough performance (from 500ms to 8ms per frame).

2023.04 - 2024.01

Shanghai ElanTech Lab

Embodied Intelligence Research Group

Advisor: Prof. Lan Xu (ShanghaiTech University)

Implemented and improved network structures including ResNet, BiRNN, Diffusion, and Transformers. Built large-scale human motion capture facilities using OptiTrack and Z-cam camera arrays for data collection. Integrated existing algorithms with Motion Diffusion Model to reconstruct human motion using IMU sensor signals.

Projects

Arximia: LLM-Agent Based Scientific Collaboration Platform

12/2024 - Present

Building a comprehensive platform for global scientists to enhance reading, searching, discussing, referencing, and managing scientific papers. Using React (Frontend) + Python (Backend) to create an integrated service that leverages the latest LLM-Agent technology for core functionalities.

This project is currently under development, will be available later on arximia.com

Ray Tracing Rendering Engine

10/2023

Built a C++ ray tracing renderer implementing diffuse Lambertian bodies, full/semi-reflective metals, semi-transparent media, light sources, and other materials. Rewritten the entire codebase in Nvidia CUDA C++ and designed Importance Sampling to accelerate rendering, achieving a 200× speed improvement.

Skills & Interests

Technical Skills

  • Machine Learning & AI (PyTorch, Triton, pybind11)
  • Python (extensive experience with high-performance ML/AI)
  • C/C++ & CUDA C++
  • Hugging Face Transformers (authored merged PR to official repo)
  • Custom kernel development (Flash-Attention-style Triton kernels)
  • Unity/C#

Hobbies

  • Piano (Level 10, Shanghai Conservatory of Music)
  • Badminton
  • Basketball
  • Western Philosophy

Honors