Open to full-time · May 2026

Data Analytics
and BI Professional
with an AI edge

MS Information Management at UIUC. Previously Amazon and LTIMindtree. Building production data pipelines, BI dashboards, and AI-powered analytics tools.

F-1 OPT · Available June 29, 2026 · STEM OPT eligible (3 years)
SQL Power BI Tableau Python AWS Snowflake Stakeholder Management
Ankita Saha
3.95
GPA · UIUC
2+
Years Experience
4M+
Records Processed
8
Projects Built
40%
Manual Effort Reduced
3
Cloud Certifications

Work Experience

Production analytics across audit, compliance, and BI at scale.

Amazon
Summer 2025
Business Intelligence Engineer Intern, Internal Audit
LTIMindtree
2022 to 2024
Software Engineer, BI and Compliance Analytics
The Entrepreneurship Network
2021
Technical Program Management Intern

Education

University of Illinois Urbana-Champaign
MS Information Management
Business Intelligence and Data Science specialization
Aug 2024 – May 2026 · Champaign, IL

Coursework: Data Warehousing and BI, Data Visualization, Methods of Data Science, Cloud-based Machine Learning, Database Design
GPA 3.95 / 4.0
University of Mumbai
BE Electronics & Telecommunication
Bachelor of Engineering
Graduated Jun 2022 · Mumbai, India

Coursework: Machine Learning, Database Management Systems, Cloud Computing
GPA 9.16 / 10.0

Projects

Data engineering, BI, cloud ML, and analytics projects built across multiple domains.

01 / Data Engineering
Healthcare · SQL · ML
Healthcare Claims Analytics Pipeline
Fact-dimension data model on 4,500+ healthcare claims with analytics-ready schemas. SQL transformations to surface claim spend by provider specialty. ML classification with SMOTE to improve minority detection.
4,500+ claims · zero duplicates · zero null IDs
54% recall on minority class via SMOTE balancing
End-to-end automated data quality checks
PythonSQLscikit-learnSMOTEFact-Dimension Modeling
View on GitHub
02 / BI and Cloud
Azure · Power BI · Data Warehouse
Retail Analytics Dashboard
Cloud-based star schema data warehouse on Azure for 3.4M+ Instacart orders. 5 Power BI dashboards with DAX surfacing sales trends, peak order days, and top-performing products. Self-serve analytics for stakeholders.
3.4M+ orders analyzed across Azure pipeline
5 Power BI dashboards with DAX
Identified top reorder product with 70K+ reorders
Azure Data FactoryPower BIDAXAzure SQLBlob Storage
View on GitHub
03 / Cloud ML
AWS · SageMaker · Risk Modeling
Predicting Car Insurance Claims on AWS
Two-stage ML pipeline on AWS S3 and SageMaker for real-time claim risk scoring. Reduced unnecessary severity scoring by 40% through conditional execution. SHAP for interpretability and risk driver identification.
40% reduction in unnecessary severity scoring
AWS SageMaker for scalable training and inference
SHAP-based interpretability for pricing decisions
AWS SageMakerAWS S3PythonSHAPSMOTE
View on GitHub
04 / ML Research
scikit-learn · Classification · Evaluation
Breast Cancer ML Classifier Evaluation
Benchmarked and tuned 4 classifiers using cross-validation and ROC analysis. Achieved 99.1% accuracy and 0.99 ROC-AUC with tuned SVM. Compared interpretability vs performance to guide model selection.
99.1% accuracy · ROC-AUC 0.99 with tuned SVM
4 classifiers benchmarked via GridSearchCV
False negative minimization for safer diagnostics
Pythonscikit-learnGridSearchCVROC Analysis
View on GitHub
05 / Analytics
R · Regression · Supply Chain
Fast Fashion Supply Chain Optimization
Joined 4 supply chain datasets with 82K+ records to analyze shipping delays, warehouse selection, and logistics cost drivers. Regression model predicting total shipping and production costs across 20 warehouses and 5 factories.
82K+ records across 20 warehouses and 5 factories
R squared 0.64 regression model
Projected 11.69% reduction in logistics costs
Rggplot2dplyrtidyrRegression
View on GitHub

Skills

Languages and Query
SQL Python R DAX
BI and Visualization
Power BI Tableau AWS QuickSight Excel Plotly Streamlit
Cloud and Platforms
Snowflake AWS Redshift AWS S3 AWS SageMaker
Data Engineering and ML
ETL/ELT Pipelines Dimensional Modeling Star Schema Data Quality scikit-learn SHAP Anomaly Detection
Methods and Tools
Agile Stakeholder Management Jira Alteryx Advanced Excel Git/GitHub
Microsoft Azure Fundamentals · AZ-900
AWS Cloud Practitioner · CLF-C02
AWS AI Fundamentals · AIF-C01

My AI Learning Journey

I am not going to pretend I always knew this. Here is an honest account of how I went from zero to building three live AI systems, written the way I would tell it to a colleague.

The starting point
Understanding what LLMs actually are
I started with the basics. What is a large language model, how does it differ from a search engine, what does it mean to reason over text. I read documentation, studied how tokens and context windows work, and understood why the prompt matters so much before writing a single line of code.
"I Googled what a token was more than once."
First real step
How APIs work, and making my first call
I learned how REST APIs work, what requests and responses look like, how authentication with API keys works, and how to parse JSON in Python. My first Claude API call returned an error. Then I fixed it. Seeing a real LLM respond to my code for the first time was genuinely exciting.
"My first call failed because of a missing header. Classic."
The thing that changed everything
Prompt engineering is the product
I learned that the system prompt determines everything. Vague prompts give vague outputs. A prompt with exact schema, valid enum values, output format rules, and examples gives production-quality results. I rewrote my prompts dozens of times. This is where I spent the most time and learned the most.
"My first prompt said analyze this comment. That was not enough."
Building and deploying
From notebooks to live apps
Going from Jupyter notebooks to deployed Streamlit apps taught me more than any course. Conda environments, secrets management, absolute vs relative paths, deployment errors, requirements files. All the things that do not show up in tutorials but matter completely in practice.
"A hardcoded Windows path broke my app on the cloud. Two hours of debugging."
The insight I am most proud of
ML vs LLM: when to use which
Building my first AI project forced me to think carefully about this. ML is fast and cheap, run it on everything. LLMs are slow and expensive, use them on hard cases. The disagreement between the two signals is itself the most valuable output. That systems-level thinking is what I worked hardest to develop.
"A 49% disagreement rate was not a bug. It was the whole point."
What I built
Three live AI systems in the Trust and Safety domain
I applied everything I learned to build three connected AI systems covering detection, operations, and investigation. Each one is live and clickable.
AI Content Safety Classifier
ML and Claude API dual-signal pipeline. 86% recall, 49% disagreement rate.
T&S Ops Dashboard
90-day ops monitoring with attack detection and Claude weekly summaries.
T&S Analyst Copilot
NL-to-SQL with agentic retry, guardrails, and chart auto-detection.
What is next
Embeddings, RAG, and deeper evaluation
Planning to study vector embeddings and retrieval-augmented generation next. I want to understand how to build systems that retrieve relevant context before generating, which would make analyst tools dramatically more powerful. Also planning to go deeper on LLM evaluation frameworks.
"Still learning. That is the honest answer."

Let's work together

Open to Data Analyst, Business Analyst, BI Engineer, and Analytics Engineer roles. Graduating May 2026. Available from June 29, 2026 on F-1 OPT with STEM extension eligible for 3 years.