
Minshen Zhang
Computer Science Graduate, NLP Researcher, Developer
About Me
Hi there! My name is Minshen Zhang. You can call me Alex.
I recently graduated with a Bachelor's degree in Computer Science and Technology from ShanghaiTech University in June 2025. I will be pursuing a Master's degree in Computer Science at UC San Diego starting Fall 2025.
My research interests primarily focus on Natural Language Processing and Machine Learning. Currently, I am deeply researching transformers mechanisms and exploring how to design faster and better MLP modules. I am also investigating how to use Sparse Autoencoder (SAE) techniques to mitigate hallucination problems in large language models during RAG processes.
I'm open for academic and professional collaborations. If you are interested, please feel free to email me.
Education
UC San Diego
Master in Computer Science (CS75)
ShanghaiTech University
Bachelor of Computer Science and Technology
Overall GPA: 3.69/4.0; Major GPA: 3.8+/4.0
English Tests: TOEFL(100), GRE(320), CET6(602), CET4(607), Duolingo(135)
UC Berkeley Extension
Berkeley GLOBE Student Program
Overall GPA: 4.0/4.0
Research Experience
Shanghai Alibaba Ant Group NLP Research Lab
Research Intern of NLP Research Group
Mentor: Xiang Hu
Observed that the multi-subspace learning capability of Multi-Query Attention significantly enhances the expressiveness of the attention mechanism. Given the high similarity in their mathematical formulations, we hypothesize that the MLP component in Transformer blocks can also be decomposed into multi-head computations in a similar manner.
Designed an integration method for multi-head MLPs within standard Transformer blocks and a corresponding triton algorithm that improves throughput (1.06x) and reduces memory usage (over 2x) compared with LlamaMLP (SwiGLU) baselines; project lead.
ShanghaiTech Kewei Tu's Lab
Research Assistant of NLP Research Lab
Mentor: Prof. Kewei Tu (ShanghaiTech University)
Building on Have I Learned This Entity? (ICLR 2025), using sparse autoencoders to test whether an LLM encodes entity-specific knowledge in context and to identify sparse latent directions. Applying these directions to downstream tasks such as feedback-guided RAG retrieval, RAG-based hallucination detection and latent direction steering; project lead
Shanghai Qizhi Institute
Research Intern of Machine Learning Research Group
Mentor: Prof. Tao Du (Tsinghua University)
Mathematically discovered an analytical solution for real-time inverse-refraction problems based on knowledge of physical optics, linear algebra and computer numerical methods. Developed real-time machine learning algorithms for water surface reconstruction to assist robot control tasks. Based on Snell's law, derived the relationship between water surface normal vectors and visual information, proving the feasibility of machine learning approaches. Implemented core algorithms using Nvidia-Warp-Cuda, achieving breakthrough performance (from 500ms to 8ms per frame).
Shanghai ElanTech Lab
Embodied Intelligence Research Group
Advisor: Prof. Lan Xu (ShanghaiTech University)
Implemented and improved network structures including ResNet, BiRNN, Diffusion, and Transformers. Built large-scale human motion capture facilities using OptiTrack and Z-cam camera arrays for data collection. Integrated existing algorithms with Motion Diffusion Model to reconstruct human motion using IMU sensor signals.
Projects
Arximia: LLM-Agent Based Scientific Collaboration Platform
12/2024 - Present
Building a comprehensive platform for global scientists to enhance reading, searching, discussing, referencing, and managing scientific papers. Using React (Frontend) + Python (Backend) to create an integrated service that leverages the latest LLM-Agent technology for core functionalities.
This project is currently under development, will be available later on arximia.com
Ray Tracing Rendering Engine
10/2023
Built a C++ ray tracing renderer implementing diffuse Lambertian bodies, full/semi-reflective metals, semi-transparent media, light sources, and other materials. Rewritten the entire codebase in Nvidia CUDA C++ and designed Importance Sampling to accelerate rendering, achieving a 200× speed improvement.
Skills & Interests
Technical Skills
- Machine Learning & AI (PyTorch, Triton, pybind11)
- Python (extensive experience with high-performance ML/AI)
- C/C++ & CUDA C++
- Hugging Face Transformers (authored merged PR to official repo)
- Custom kernel development (Flash-Attention-style Triton kernels)
- Unity/C#
Hobbies
- Piano (Level 10, Shanghai Conservatory of Music)
- Badminton
- Basketball
- Western Philosophy
Honors
- Outstanding Graduate of ShanghaiTech University (2024-2025 Academic Year)
- Outstanding Student, ShanghaiTech University (2021-2022, 2023-2024)
- Teaching Assistant, CS100 Computer Programming, ShanghaiTech University (02/2024-06/2024)