The starting point
Understanding what LLMs actually are
I started with the basics. What is a large language model, how does it differ from a search engine, what does it mean to reason over text. I read documentation, studied how tokens and context windows work, and understood why the prompt matters so much before writing a single line of code.
"I Googled what a token was more than once."
First real step
How APIs work, and making my first call
I learned how REST APIs work, what requests and responses look like, how authentication with API keys works, and how to parse JSON in Python. My first Claude API call returned an error. Then I fixed it. Seeing a real LLM respond to my code for the first time was genuinely exciting.
"My first call failed because of a missing header. Classic."
The thing that changed everything
Prompt engineering is the product
I learned that the system prompt determines everything. Vague prompts give vague outputs. A prompt with exact schema, valid enum values, output format rules, and examples gives production-quality results. I rewrote my prompts dozens of times. This is where I spent the most time and learned the most.
"My first prompt said analyze this comment. That was not enough."
Building and deploying
From notebooks to live apps
Going from Jupyter notebooks to deployed Streamlit apps taught me more than any course. Conda environments, secrets management, absolute vs relative paths, deployment errors, requirements files. All the things that do not show up in tutorials but matter completely in practice.
"A hardcoded Windows path broke my app on the cloud. Two hours of debugging."
The insight I am most proud of
ML vs LLM: when to use which
Building my first AI project forced me to think carefully about this. ML is fast and cheap, run it on everything. LLMs are slow and expensive, use them on hard cases. The disagreement between the two signals is itself the most valuable output. That systems-level thinking is what I worked hardest to develop.
"A 49% disagreement rate was not a bug. It was the whole point."
What I built
Three live AI systems in the Trust and Safety domain
I applied everything I learned to build three connected AI systems covering detection, operations, and investigation. Each one is live and clickable.
AI Content Safety Classifier
ML and Claude API dual-signal pipeline. 86% recall, 49% disagreement rate.
T&S Ops Dashboard
90-day ops monitoring with attack detection and Claude weekly summaries.
T&S Analyst Copilot
NL-to-SQL with agentic retry, guardrails, and chart auto-detection.
What is next
Embeddings, RAG, and deeper evaluation
Planning to study vector embeddings and retrieval-augmented generation next. I want to understand how to build systems that retrieve relevant context before generating, which would make analyst tools dramatically more powerful. Also planning to go deeper on LLM evaluation frameworks.
"Still learning. That is the honest answer."