Open to Opportunities

Karthik

Mulugu

AI Engineer · Machine Learning · Data Scientist

karthik@ai-engineer ~

❯ cat about.txt

▌

view projects get in touch

scroll

Work Experience

AI Engineer Intern

Pronix Inc

● Present Remote · Oct 2025

Boosted retrieval accuracy by 35% by building RAG pipelines with Azure/OpenAI LLMs using embedding-based and hybrid retrieval techniques, enabling more context-aware chatbot responses.
Reduced HR support workload by 40% by designing and deploying chatbot workflows on Kore.ai Agentic Platform, integrating Workday APIs for real-time employee profile and leave management.
Increased chatbot task success rate by 25% and reduced fallback responses by 30% by optimizing intent classification, dialog state management, and workflow orchestration.
Eliminated critical edge-case failures by designing and executing scenario-based validation strategies within Salesforce for JBL, improving chatbot response consistency across high-priority workflows.

RAGNLPOpenAIAzureWorkday APIKore.aiGenerative AI

AI Automation Extern

Extern

Top Performer Remote · Jun–Sep 2025

Increased mortgage document-processing efficiency by 70% by building scalable Python automation pipelines to parse, chunk, and structure 1,000+ financial documents, reducing manual review time from hours to minutes.
Improved OCR accuracy by 35% and retrieval precision by 40% by optimizing Tesseract/EasyOCR and architecting a RAG pipeline using LangChain, LlamaIndex, and FAISS for high-precision document search and QA.
Boosted answer accuracy by 28% and reduced fallback responses by 35% by benchmarking and optimizing Gemini vs. Mistral models — earning recognition as Top Performer for technical impact.

PythonOCRRAGLlamaIndexFAISSGradioLLMs

Beyond work

ISS Volunteer

University at Buffalo · Aug 2024

Welcomed international grad students during Fall 2024 orientation, guiding them on campus resources and community integration.

Hackathon Coordinator

HackFiesta — TechnoMist 2K23 · Hyderabad

Organised and coordinated a large-scale hackathon with multiple teams, mentors, and event logistics end-to-end.

Education

M.S. Computer Science — AI/ML 3.7 GPA

University at Buffalo, SUNY

Jan 2024 – Jun 2025 · Buffalo, NY

Specialised in ML, Deep Learning, Computer Vision, and Data Visualization — maintaining a 3.7 GPA while concurrently building production AI systems across NLP, RAG, and forecasting domains.

B.Tech. Information Technology

Jawaharlal Nehru Technological University, Hyderabad

Aug 2019 – Jul 2023 · Hyderabad, India

Built solid foundations in DSA, DBMS, OOP, and systems programming. Ranked in the top 1% of the IT department for academic performance across core engineering coursework.

Research

IEEE Access Under Review Apr 2026

DocuQuery: Hybrid Lexical-Dense Retrieval with LangGraph Orchestration for Robust PDF Question Answering

Karthik Mulugu · University at Buffalo

10.13140/RG.2.2.21211.73766

↑75%

Faster Analysis

Document processing vs. single-retriever baseline

↑30%

Retrieval Precision

Over BM25-only across multi-intent query sets

3-way

Hybrid Fusion

BM25 + FAISS + TF-IDF with query-aware gating

DocuQuery is an open-source system for natural-language question answering over PDF documents. It combines hybrid retrieval — BM25, FAISS, and TF-IDF — with LangGraph multi-intent orchestration that routes queries across summarisation, comparison, refinement, and direct QA modes, backed by Gemini generation. A key finding is that fixed-weight hybrid fusion can degrade below pure lexical retrieval when dense similarity is misleading, motivating a query-aware fusion gating mechanism.

Projects

↑ 8 projects — click a tab to explore

PerceptAI — Real-Time Vision Intelligence

Real-time multi-modal computer vision system that simultaneously analyzes face, hands, body posture, and scene objects — streamed live to a browser dashboard with AI-generated behavioral reports.

Runs 5 concurrent ML models per frame — face mesh (468 landmarks), hand tracking (21 keypoints), body pose (33 keypoints), Facenet512 face recognition, and YOLO-World object detection — in a single CPU-only Docker container with no GPU dependency
7-class emotion recognition with per-class confidence scores, PERCLOS drowsiness monitoring, gaze classification (Left/Center/Right), blink counting, and 3-axis head pose estimation — all computed locally with zero external API calls
Built cross-session face memory using Facenet512 embeddings stored in a ChromaDB vector database with DBSCAN clustering — re-identifies faces across sessions without storing raw images
Deployed on HuggingFace Spaces via Docker with a browser-camera WebSocket architecture — browser captures frames via WebRTC getUserMedia, no server camera needed, works on any cloud

PythonFastAPIMediaPipeYOLO-WorldDeepFaceTensorFlowChromaDBDockerWebSocket

Live app may take ~30s to wake

Hospital Inpatient Cost Prediction

End-to-end ML pipeline to predict patient costs, explain cost drivers with SHAP, and surface uncertainty estimates — built on 2.5M real NY state records for clinical finance decision-support.

Benchmarked 5 model families with Optuna; XGBoost achieved R²=0.969, RMSE=$2,035 — 58% RMSE reduction over linear baseline
SHAP TreeExplainer for feature attribution + quantile regression for calibrated 80% prediction intervals, enabling clinicians to act on uncertainty rather than point estimates
FastAPI service + 6-tab Streamlit dashboard + MLflow experiment tracking — all orchestrated via Docker Compose for one-command deployment

PythonXGBoostSHAPFastAPIStreamlitMLflow

Live app may take ~30s to wake

DocuQuery AI Assistant

Production-ready RAG system for natural-language question answering over PDF documents — combining hybrid retrieval with LangGraph multi-intent orchestration. Submitted to IEEE Access.

75% faster document analysis via LangGraph orchestration routing queries across summarisation, comparison, refinement, and QA modes backed by Gemini generation
30% retrieval precision gain using hybrid BM25 + FAISS + TF-IDF with cross-encoder reranking — outperforms pure dense retrieval on factual QA
Key finding: fixed-weight hybrid fusion degrades below pure lexical retrieval when dense similarity is misleading — motivating query-aware fusion gating (detailed in the paper)

→ see Research section for full paper details

LangGraphGeminiFAISSPineconeRAG

Live app may take ~30s to wake

Sentiment Analysis — DistilBERT + BiLSTM

Comparative study of transformer vs. recurrent architectures on 204K HuffPost headlines, with a live RSS pipeline that surfaces hour-by-hour sentiment trends from 6 major outlets.

Fine-tuned DistilBERT achieved 80.4% accuracy and 0.919 ROC-AUC, outperforming BiLSTM by 4.4 pts — gap traced to noisy pseudo-labels in multi-class setup, not model capacity
Live RSS ingestion from 6 outlets (BBC, Reuters, CNN, NYT, Guardian, AP) with hour-by-hour sentiment trend visualisation
Full MLOps stack: FastAPI inference service + Streamlit dashboard + MLflow tracking + Docker + GitHub Actions CI — deployed to Streamlit Cloud

PyTorchNLPFastAPIStreamlitDocker

Live app may take ~30s to wake

Amazon Stock Forecasting Dashboard

Multi-model forecasting dashboard on AWS EC2 — Seq2Seq LSTM reduced MAPE from 17.45% to 8.86% with technical indicator fusion and backtested trading signals.

Seq2Seq LSTM with encoder-decoder architecture trained on 6,300+ daily records; added RSI, MACD, and Bollinger Bands as input features — MAPE dropped from 17.45% to 8.86%
Sentiment analysis pipeline on financial news fused with price signals for buy/sell evaluation — backtested against S&P 500 benchmark
Production pipeline on AWS EC2 with systemd automation for continuous data refresh and model retraining; Streamlit dashboard with multi-timeframe selector

LSTMSeq2SeqStreamlitAWS EC2

Live app may take ~30s to wake

Customer Churn Prediction System

End-to-end ML pipeline that predicts at-risk telecom customers, explains the drivers behind each prediction, and surfaces actionable retention strategies — designed for direct business use.

Trained XGBoost on 7,000+ telecom records with 12 SQL-engineered features; achieved 88% recall on the minority (churn) class — the metric that matters most for retention campaigns
Cohort SQL analysis pinpointed that high-charge month-to-month customers drive 63% of churn, directly informing which segment to target first with discount offers
Deployed Streamlit app with a live decision-threshold slider and per-customer SHAP waterfall charts — lets non-technical stakeholders explore predictions without touching code

XGBoostSHAPSQLStreamlitTableau

Live app may take ~30s to wake

Human Action Recognition — CNN Models

Systematic benchmark of four CNN architectures for classifying 40 human actions across 9,500+ Stanford 40 images — comparing training-from-scratch versus transfer learning strategies.

Trained VGG16 and ResNet50 from scratch vs. GoogLeNet and DenseNet with ImageNet transfer learning — DenseNet transfer achieved best accuracy with 3x fewer training epochs
80% real-time inference accuracy on held-out test set with a live Streamlit prediction app for webcam or image upload
Systematic ablation study comparing batch sizes, learning rates, and augmentation strategies across all four architectures — results documented with loss/accuracy curves

TensorFlowResNet50Transfer LearningStreamlit

Text Summarization — AI Agent + Groq

Autonomous AI agent with Groq Llama 3.3 70B that detects document domain, selects the optimal summarisation strategy, evaluates quality with ROUGE-L, and auto-retries on low scores.

4-step agentic pipeline: detect domain → select prompt strategy → evaluate with ROUGE-L → auto-retry if score below threshold — no human intervention needed
SSE streaming for real-time token output; multi-modal input (text, PDF, URL); export as TXT, JSON, or Markdown — all via a single Vanilla JS frontend
FastAPI backend + Docker container + GitHub Actions CI/CD pipeline deploying to HuggingFace Spaces on every push

PythonGroqFastAPIDockerHuggingFace

Live app may take ~30s to wake