Annu Sharma

About me:

I'm a Data Science grad student at the University of Maryland with a background in Electrical Engineering and Physics from BITS Pilani. Before UMD, I spent two years at Synechron's AI Finlabs building agentic systems, fine-tuning language models, and making sense of messy data. I'm drawn to problems at the intersection of machine learning and the real world — whether that's improving how models reason, making AI systems more reliable, or turning complex datasets into something meaningful. 📍 College Park, MD | 📬 sharma25@umd.edu | 🔗 LinkedIn

What I’m Working On

MS in Data Science @ UMD (2025–2027) — coursework in Machine Learning Principles, Big Data Systems, Algorithms for Data Science, Data Representation & Modeling, and Linear Algebra & Statistical Methods
Research on agentic AI, RAG systems, Data Science agents, Reinforcement Learning, weather forecasting, and responsible AI evaluation
Selected for the U21 Sustainable Policy Leadership Summer School (McMaster University, 2026) — one of ~60 participants globally

A Few Things I’ve Built

Project	What it does
Multi-Agent RAG System	Production-ready RAG pipeline with semantic caching, token optimization, and a live monitoring dashboard (FastAPI + LangGraph)
DS Agent Benchmark Comparison	Comparative evaluation of 4 data science agents across 4 benchmark datasets — measuring task completion, code correctness, and reasoning quality at scale (PySpark + Airflow)
Floor Plan Symbol Detector	Computer vision pipeline for architectural drawings — fine-tuned Faster R-CNN on the CubiCasa5K dataset, achieving mAP@0.5 of 0.54
YRBSS Survey Analysis	Statistical analysis of CDC national survey data (N ≈ 20,100) with post-stratification reweighting for sexual minority populations
RL Playground	Interactive dashboard for learning Reinforcement Learning from first principles — live MDP Visualizer, Gridworld with Value & Policy Iteration, Bellman equation explorer

RL Playground — Interactive RL Learning Dashboard

Experience

Jr. Associate, ML Research — Synechron Technologies, Finlabs (July 2024 – July 2025)

Focused on production-grade ML systems for the financial domain. Fine-tuned the Qwen2 Vision-Language Model using LoRA and 4-bit quantization for financial document extraction, improving F1 from 0.68 to 0.85 on a manually-labeled test set. Trained RoBERTa-based embeddings combined with XGBoost and Random Forest classifiers for customer segmentation across 50K records, integrating the pipeline into a production demo with stratified cross-validation. Also built a FastAPI microservice for automated PostgreSQL DDL validation — catching referential integrity and primary key errors before production deployment — and contributed to the codebase for an in-house agentic AI framework, including technical documentation.

Skills developed: LLM fine-tuning (LoRA, quantization), vision-language models, ensemble classifiers, FastAPI, PostgreSQL, agentic systems, technical documentation

AI/ML Intern — Synechron Technologies, Finlabs (July 2023 – June 2024)

Built foundational agentic and RAG infrastructure from the ground up. Developed multi-agent workflows using LangGraph and Autogen with locally hosted open-source models via Ollama, writing evaluation criteria and performance metrics per agent node to track output consistency and identify failure modes. Built an end-to-end document QA pipeline using ChromaDB, LangChain, and Streamlit to extract structured insights from unstructured financial documents at scale. Presented findings to the broader team and maintained reproducible documentation throughout.

Skills developed: multi-agent orchestration, RAG pipelines, LangGraph, LangChain, ChromaDB, Streamlit, LLM evaluation, vector databases

Skills

Languages Python · R · SQL · C · HTML

ML / AI PyTorch · TensorFlow · Scikit-learn · LangChain · LangGraph · Autogen · FastAPI · spaCy · NLTK · LLM Fine-Tuning (LoRA) · Diffusion Models · OpenCV · Embeddings

Data & Big Data PySpark · Airflow · Pandas · NumPy · SciPy · Matplotlib · Power BI

Tools & Platforms Git · Docker · Weights & Biases · ChromaDB · MongoDB · PostgreSQL · AWS · GCP

Interests Generative AI · Computer Vision · Agentic Systems · Statistical Modeling · Responsible AI

Writing & Research

Towards Evaluating Robustness of Prompt Adherence in Text-to-Image Models — Co-authored evaluation of prompt faithfulness across 5 state-of-the-art text-to-image models; designed a VAE-based evaluation framework and tracked all experiments in Weights & Biases (2024–2025)
AI Hallucinations: Why Bots Make Up Information — Published on Synechron’s platform; explains LLM hallucinations for a non-technical audience (2024)

Always open to research collaborations, interesting problems, and good conversations.