Skip to content

Applied Data Science with Python

Course Information

This repository contains the course materials for UCSF DataSci 223: Applied Data Science with Python.

Course Topics (Winter 2026 - 11 Lectures)

Foundational (L01-L04)

  1. Setup + Debugging - Notebook hygiene, defensive programming, VS Code debugger
  2. Larger-than-Memory Data - Polars lazy evaluation, out-of-core processing, parquet
  3. SQL for Data Analysis - SELECT, JOIN, GROUP BY, window functions, pandas integration
  4. NLP Foundations - Text preprocessing, embeddings, sentiment, clinical text applications

ML/AI Progression (L05-L08)

  1. Classification - Train/test splits, evaluation metrics, Random Forest, XGBoost
  2. Neural Networks - MLP, CNN, RNN/LSTM, PyTorch training loop
  3. Transformers & Deep Learning - Attention mechanism, Hugging Face, tokenization
  4. LLMs - DIY -> API, Agentic & Workflows - nanoGPT walkthrough, embeddings, fine-tuning concepts, API integration, agents, prompt engineering

Applied / Student Choice (L09-L11)

09-11. Student Vote (TBD) - Some options: - Computer Vision (transfer learning, medical imaging) - Visualization & Dashboards (Altair, Streamlit, MkDocs reports) - Time Series & Forecasting (ARIMA, ML regressors) - A/B Testing (causal inference, power analysis) - Distributed Computing (threads/processes, HPC intro) - End-to-End Project (CRISP-DM, capstone guidance) - Jobs, technical interviews, & impostor syndrome - Generative AI with Images - Feature Engineering and Selection - Algorithms and complexity notation - Local setup with the "Modern Data Stack" - Deploying a basic model/app to the web