Ankita Saha — Data Analytics & BI Professional

Portfolio

Projects

Data engineering, BI, cloud ML, and analytics projects built across multiple domains.

01 / Data Engineering

Healthcare · SQL · ML

Healthcare Claims Analytics Pipeline

Fact-dimension data model on 4,500+ healthcare claims with analytics-ready schemas. SQL transformations to surface claim spend by provider specialty. ML classification with SMOTE to improve minority detection.

4,500+ claims · zero duplicates · zero null IDs
54% recall on minority class via SMOTE balancing
End-to-end automated data quality checks

PythonSQLscikit-learnSMOTEFact-Dimension Modeling

View on GitHub

02 / BI and Cloud

Azure · Power BI · Data Warehouse

Retail Analytics Dashboard

Cloud-based star schema data warehouse on Azure for 3.4M+ Instacart orders. 5 Power BI dashboards with DAX surfacing sales trends, peak order days, and top-performing products. Self-serve analytics for stakeholders.

3.4M+ orders analyzed across Azure pipeline
5 Power BI dashboards with DAX
Identified top reorder product with 70K+ reorders

Azure Data FactoryPower BIDAXAzure SQLBlob Storage

View on GitHub

03 / Cloud ML

AWS · SageMaker · Risk Modeling

Predicting Car Insurance Claims on AWS

Two-stage ML pipeline on AWS S3 and SageMaker for real-time claim risk scoring. Reduced unnecessary severity scoring by 40% through conditional execution. SHAP for interpretability and risk driver identification.

40% reduction in unnecessary severity scoring
AWS SageMaker for scalable training and inference
SHAP-based interpretability for pricing decisions

AWS SageMakerAWS S3PythonSHAPSMOTE

View on GitHub

04 / ML Research

scikit-learn · Classification · Evaluation

Breast Cancer ML Classifier Evaluation

Benchmarked and tuned 4 classifiers using cross-validation and ROC analysis. Achieved 99.1% accuracy and 0.99 ROC-AUC with tuned SVM. Compared interpretability vs performance to guide model selection.

99.1% accuracy · ROC-AUC 0.99 with tuned SVM
4 classifiers benchmarked via GridSearchCV
False negative minimization for safer diagnostics

Pythonscikit-learnGridSearchCVROC Analysis

View on GitHub

05 / Analytics

R · Regression · Supply Chain

Fast Fashion Supply Chain Optimization

Joined 4 supply chain datasets with 82K+ records to analyze shipping delays, warehouse selection, and logistics cost drivers. Regression model predicting total shipping and production costs across 20 warehouses and 5 factories.

82K+ records across 20 warehouses and 5 factories
R squared 0.64 regression model
Projected 11.69% reduction in logistics costs

Rggplot2dplyrtidyrRegression

View on GitHub

Honest account

My AI Learning Journey

I am not going to pretend I always knew this. Here is an honest account of how I went from zero to building three live AI systems, written the way I would tell it to a colleague.

The starting point

Understanding what LLMs actually are

I started with the basics. What is a large language model, how does it differ from a search engine, what does it mean to reason over text. I read documentation, studied how tokens and context windows work, and understood why the prompt matters so much before writing a single line of code.

"I Googled what a token was more than once."

First real step

How APIs work, and making my first call

I learned how REST APIs work, what requests and responses look like, how authentication with API keys works, and how to parse JSON in Python. My first Claude API call returned an error. Then I fixed it. Seeing a real LLM respond to my code for the first time was genuinely exciting.

"My first call failed because of a missing header. Classic."

The thing that changed everything

Prompt engineering is the product

I learned that the system prompt determines everything. Vague prompts give vague outputs. A prompt with exact schema, valid enum values, output format rules, and examples gives production-quality results. I rewrote my prompts dozens of times. This is where I spent the most time and learned the most.

"My first prompt said analyze this comment. That was not enough."

Building and deploying

From notebooks to live apps

Going from Jupyter notebooks to deployed Streamlit apps taught me more than any course. Conda environments, secrets management, absolute vs relative paths, deployment errors, requirements files. All the things that do not show up in tutorials but matter completely in practice.

"A hardcoded Windows path broke my app on the cloud. Two hours of debugging."

The insight I am most proud of

ML vs LLM: when to use which

Building my first AI project forced me to think carefully about this. ML is fast and cheap, run it on everything. LLMs are slow and expensive, use them on hard cases. The disagreement between the two signals is itself the most valuable output. That systems-level thinking is what I worked hardest to develop.

"A 49% disagreement rate was not a bug. It was the whole point."

What I built

Three live AI systems in the Trust and Safety domain

I applied everything I learned to build three connected AI systems covering detection, operations, and investigation. Each one is live and clickable.

AI Content Safety Classifier

ML and Claude API dual-signal pipeline. 86% recall, 49% disagreement rate.

Demo GitHub

T&S Ops Dashboard

90-day ops monitoring with attack detection and Claude weekly summaries.

Demo GitHub

T&S Analyst Copilot

NL-to-SQL with agentic retry, guardrails, and chart auto-detection.

Demo GitHub

What is next

Embeddings, RAG, and deeper evaluation

Planning to study vector embeddings and retrieval-augmented generation next. I want to understand how to build systems that retrieve relevant context before generating, which would make analyst tools dramatically more powerful. Also planning to go deeper on LLM evaluation frameworks.

"Still learning. That is the honest answer."

Data Analytics
and BI Professional
with an AI edge

Work Experience

Education

Projects

Skills

My AI Learning Journey

Let's work together

Data Analyticsand BI Professionalwith an AI edge

Work Experience

Education

Projects

Skills

My AI Learning Journey

Let's work together

Data Analytics
and BI Professional
with an AI edge