Multimodal AI & Retrieval
Document understanding, text-image retrieval, multilingual embeddings, vector search, OCR-aware reasoning.
Embeddings · CLIP · BERT · Pinecone · cross-document reasoning
Research Scientist @ServiceNow | Master's @MILA AI Institute | Ex-SWE | NIT Warangal
Applied Research Scientist at ServiceNow focused on agentic document workflows, multilingual and multimodal search, red-teaming, and scalable ML systems across text, image, audio, and video.
About
Applied Research Scientist at ServiceNow building multimodal document intelligence, retrieval, evaluation, and AI safety systems for enterprise AI.
My work spans agentic document workflows, multilingual and multimodal search, red-teaming, and scalable ML services across text, image, audio, and video.
I combine research training from MILA with hands-on product engineering to turn strong ideas into systems that ship.
What this portfolio emphasizes
Focus Areas
Document understanding, text-image retrieval, multilingual embeddings, vector search, OCR-aware reasoning.
Embeddings · CLIP · BERT · Pinecone · cross-document reasoning
Enterprise document agents that combine extraction, summarization, question answering, and workflow orchestration.
Agentic Frameworks · parsing pipelines · document classification · orchestration
Adversarial testing, prompt-injection defense, safety datasets, and robustness evaluation for multimodal agents.
Red teaming · prompt hardening · sanitization · benchmark design
Scalable pipelines that connect research models to services, monitoring, and real user workflows.
Kubernetes · Docker · GraphQL · React · model serving · microservices
Experience
text-moderate npm package for LLM content moderation.Selected Work
Flagship project
Built a multilingual mixture-of-experts architecture to improve low-resource language performance without scaling a single monolithic model.
Combined a compact base model with task-specific experts through cross-attention and selective layer freezing to keep training efficient.
Improved low-resource multilingual performance by 28% while reducing compute cost by roughly 60%.
Multimodal systems
Designed a multimodal reasoning system for joint text-image understanding instead of a pure text-only LLM workflow.
Integrated CLIP ViT with StableLM via cross-attention and packaged the workflow with Docker and Hugging Face Hub for reproducible experimentation.
Turned a research prototype into a repeatable multimodal experimentation stack rather than a notebook-only demo.
Live deployment
Built an end-to-end expected-goals pipeline for hockey analytics with real-time data and live model serving.
Shipped a Dockerized Flask API, Streamlit dashboard, NHL API ingestion layer, and CometML-backed model registry for experiment hot-swapping.
Demonstrated production-style model serving, monitoring, and real-time prediction loops in a research environment.
Publications
Master's thesis exploring multimodal retrieval for document intelligence.
Framework for more reliable evaluation of applied LLM systems.
Study of LLM behavior in mediation and AI-based dispute resolution settings.
Applied AI analysis for humanitarian and crisis-response contexts.
Computer vision work focused on robust detection under low-quality surveillance conditions.
Education
Graduated Dec 2024
Master's in Computer Science, Machine Learning specialization
Grade: 4.2 / 4.3 · Affiliated with UdeM / McGill
Graduated May 2021
B.Tech in Computer Science & Engineering
Strong foundation in algorithms, systems, ML, and software engineering.
Contact
The fastest way to reach me is by email or LinkedIn. If you want the latest version of my background, use the resume link below.