Selected Projects

All my projects share a common goal: to evolve "blind, deaf" Large Language Models into superhuman systems that can see, hear, and reason about the physical world, in a language we all understand and trust. I aim to usher in a new era of robotics and AI capable of surpassing human abilities in certain tasks while remaining transparent, controllable, and beneficial to humanity.

AI-Powered Neural Implant for PTSD Monitoring

Project Website Demo Video Research Paper

I developed a cutting-edge dual-loop system integrating a responsive neural implant with an AI-powered wearable platform to continuously monitor and intervene in PTSD episodes. This system is the first automatic self-reporting design in its field, enabling continuous capture and analysis of both neural signals and environmental contexts without requiring manual input from patients.

The implanted device detects pathological brain activity in real time and delivers targeted neurostimulation when needed. Simultaneously, smart wearables and Meta glasses automatically identify environmental triggers and record contextual data, allowing for near-instant feedback and personalized therapeutic interventions.

Leveraging a multimodal large language model, the system seamlessly integrates internal neural states with external situational awareness, marking a pivotal advancement in PTSD therapy and naturalistic neuroscience research, bridging the gap between clinical treatment and everyday life.

Building A Unified AI-centric Language System

Research Paper GitHub Project

I co-authored groundbreaking research introducing the first-ever unified language system, published on arXiv and accepted to the ICLR 2025 Workshop. This project addresses a critical danger: as multiple intelligent agents converse, they can develop private "sub-languages" that humans cannot interpret.

We observed that DeepSeek's R1 zero thought process seems to use different languages for different types of problems, inspiring us to design a language system that is AI-centric yet still interpretable by humans. Instead of adding more tokens and attention blocks to large models, we focused on compressing meaning into a more efficient representation—spanning the entire semantic space without human grammar constraints.

Our unified language safeguards against opaque AI behavior, ensures interpretability in agent-to-agent conversations, and could potentially reduce inference overhead by approximately 15%, potentially slashing computation costs by $15.9 billion annually.

4D Vision Transformer for Video Understanding

Demo Video HuggingFace Demo (Password: WWydsBT&)

To empower LLMs with visual capabilities, I developed a 4D time-encoded Vision Transformer with temporal continuity and spatial intelligence, achieving near-superhuman comprehension of dynamic real-world events.

While existing solutions like SLAM and NeRF focus on static scenes, our approach identifies which elements remain unchanged over time and interprets these changes according to physical laws—allowing it to predict outcomes and explain why they occur. For example, our model can infer that a ball may collide with a TV even when they never appear in the same frame.

In benchmarking, our model surpassed industry-leading metrics and outperformed competitive systems—including Google Gemini 1.5 and Reka—almost three weeks before Google released the Gemini 2.0 Flash model. We've initiated discussions with MIT Mechanical Translational Engineering Lab to deploy this model for tracking pig health and emotional states using surveillance data.

Accessibility-Focused Subtitle Generator for the Deaf Community

Demo Video GitHub Repo

Most speech-to-text systems focus exclusively on converting audio into raw text, overlooking context, non-speech cues, and emotional nuance. Our system genuinely "listens" and interprets the soundtrack, adding clarity to ambiguous phrases (like distinguishing if "the fish is 1,000 pounds" refers to weight or cost).

Our pipeline enhances transcripts with contextual coherence, segments content logically, incorporates ambient sounds, and analyzes tone patterns to describe emotional content—creating rich transcripts that capture the full communicative experience.

This approach expands accessibility for the deaf and hard-of-hearing community while being 20 times cheaper than human transcription ($0.1 vs $2 per minute). Considering the film subtitling market is valued at $8.51 billion in 2024, our model could save up to $6 billion. Despite its profit potential, we open-sourced the solution to promote inclusivity.

Mindfulness-Oriented AI Assistant

HuggingFace Demo

Motivated by the stigma many face in seeking professional therapy, I built a model infused with teachings from diverse philosophies, blending Buddhism, Stoicism, and modern psychology. I deployed an agentic workflow—akin to Monte Carlo tree search—chaining multiple specialized sub-models for reasoning, context analysis, and moral guidance.

Chain-of-thought reasoning enhances this companion AI by allowing it to piece together context, emotion, and user intent more thoroughly than a single-pass approach. It can recognize subtle cues—such as signs of suicidal ideation—more reliably, ensuring empathetic engagement.

This approach predates mainstream "reasoning models," yet parallels their chain-of-thought breakthroughs. It proves how orchestrating multiple sub-LLMs (an early prototype of MoE) can elevate response quality by integrating spirituality and psychology into a cohesive mental health tool.

Cynthia Xin Wen

AI Researcher & Data Analyst & Financial Risk Management

Selected Projects

AI-Powered Neural Implant for PTSD Monitoring

Building A Unified AI-centric Language System

4D Vision Transformer for Video Understanding

Accessibility-Focused Subtitle Generator for the Deaf Community

Mindfulness-Oriented AI Assistant

Spiritual Well-Being Company

My Music Compositions

Contact