STAT 451 -- Introduction to Machine Learning and Statistical Pattern Classification (Fall 2021)
- Course Topics and Calendar
- Course Description
- Course Information, Resources, and Communication
- Course Logistics
- Resources
- Grading
- Exams
- Class Project
- Other Important Course Information
-
Campus Information
Course Topics and Calendar
Below is a list of the topics I am planning to cover in this course. Since course topics are among the most often requested information about this course, I am placing this on top of this website. More information about this course can be found in the sections that follow the course content below.
Part 1: Introduction
- L01 - Course overview, introduction to machine learning
- L02 - Introduction to Supervised Learning and k-Nearest Neighbors Classifiers
Part 2: Computational foundations
- L03 - Using Python
- L04 - Introduction to Python’s scientific computing stack
- L05 - Data preprocessing and machine learning with scikit-learn
Part 3: Tree-based methods
- L06 - Decision trees
- L07 - Ensemble methods
Part 4: Model evaluation
- L08 - Model evaluation 1 – overfitting
- L09 - Model evaluation 2 – confidence intervals
- L10 - Model evaluation 3 – cross-validation and model selection
- L11 - Model evaluation 4 – algorithm selection
- L12 - Model evaluation 5 – evaluation and performance metrics
Part 5: Dimensionality reduction and unsupervised learning
- L13 - Feature selection
- L14 - Feature extraction
- L15 - Clustering
Part 6: Bayesian learning
- L16 - Introduction to Bayesian methods
- L17 - Bayes optimal classifiers
- L18 - Naive Bayes classifiers
- L19 - Bayesian networks
Part 7: Class projects
- Student project report & peer reviewing
- Student project presentations
Course Description
Credits: 3
Course Description:
Introduction to machine learning for pattern classification, regression analysis, clustering, and dimensionality reduction. For each category, fundamental algorithms, as well as selections of contemporary, current state-of-the-art algorithms, are being discussed. The evaluation of machine learning models using statistical methods is a particular focus of this course. Statistical pattern classification approaches, including maximum likelihood estimation and Bayesian decision theory, are compared and contrasted to algorithmic and nonparametric approaches. While fundamental mathematical concepts underlying machine learning and pattern classification algorithms are being taught, the practical use of machine learning algorithms using open source libraries from the Python programming ecosystem will be of equal focus in this course.
Course Requisites:
MATH 340, 341, Graduate Student Standing, or member of the Statistics Visiting International Scholars program.
Along with introducing of the concepts of machine learning and pattern classification, the in-class lectures will provide a refresher on relevant concepts from calculus and linear algebra; however, a calculus background (e.g., Math 221) and a linear algebra background (e.g., Math 340) is recommended. While this course will also provide an introduction to the basics of the Python programming language for machine learning, it is highly recommended that students are familiar with basic programming and have completed an introductory programming class.
Learning Outcomes:
- Understanding the different subfields of machine learning, such as supervised and unsupervised learning and being familiar with essential algorithms from each subfield.
- Being able to identify whether machine learning is appropriate for solving a given problem task and which class of algorithms is best suited for real-world problem solving.
- Using statistical learning theory to combine multiple machine learning models via ensemble methods.
- Learning about best-practices for statistical model evaluation, model selection and algorithm comparisons including suitable statistical hypothesis tests.
- Using contemporary programming languages and machine learning libraries for implementing machine learning algorithms such that they can be readily applied for practical problem solving.
- Connecting concepts from probability theory with supervised learning by implementing models based on Bayes’ theorem.
Course Information, Resources, and Communication
During this online semester, we will mainly be using Canvas for this course so that everything is conveniently organized in one place. This includes the course material, announcements, homework submissions, exam, and discussions. I highly recommend using Firefox or Chrome for using Canvas because it seems some features are not well supported in Safari, yet.
- Questions: Outside the virtual office hours, I set up a Piazza forum for the course, which you can access through a link on Canvas. This is most efficient in case multiple students have the same or similar questions. Students are also encouraged to help other students on Piazza.
For personal questions (course accommodations due to illness, missed assignments, etc.), please choose the option “This private post is only visible to Instructors” before posting.
Course Logistics
Note that this course if offered twice this semester, and I am teaching both sections, so the contents will be identical.
When & Where
Section 1 – Lec 001:
- TuTh 9:30AM - 10:45AM in VAN HISE 459
Section 2 – Lec 002
- TuTh 2:30PM - 3:45PM in SMI 331
Instructors
- Instructor: Dr. Sebastian Raschka
- Teaching Assistant for Lec 001: Jitian Zhao
- Teaching Assistant for Lec 002: Yanbo Shen
Office Hours
- Prof. Sebastian Raschka (Instructor) :
- Time: Tuesday 1:00 - 2:00 pm
- Location: Medical Sciences Center room 1171 (If you enter the building through the main entrance head straight to the elevators. Then, turn left and walk down the hallway. My office should be the 3rd or 4th door on the left.)
-
Jitian Zhao (Teaching Assistant for Lec 001)
- Time: Thursday 1:00 - 2:00 pm
- Location: virtual via Zoom
-
Yanbo Shen (Teaching Assistant for Lec 002)
- Time: Tuesday 11:00 am - 12:00 pm
- Location: virtual via Zoom
Overall Format and Participation
- There are two lectures each week to deliver the main course content.
- A short self-assessment quiz will be posted at the end of each lecture week (Fridays) asking conceptual questions about the lecture’s contents. It quiz will be due on the Friday of the following week.
- There will be 3-5 hands-on homework assignments involving coding, which will be posted approximately every 3 weeks.
- Starting after the first few weeks of the semester, students will form teams of three to work and collaborate on an individual class project throughout the semester. Students should meet on a regular and weekly basis to make progress towards their project goals.
Resources
I will link resources, including internet articles and research articles that are relevant for the course. The book suggestions are recommendations but not requirements.
Machine Learning Books
Python Machine Learning, 3rd Edition (highly recommended)
- Raschka, S., & Mirjalili, V. (2019). Python Machine Learning, 3rd Ed. Birmhingham, UK: Packt Publishing. ISBN-13: 978-1789955750
- Many of the hands-on code examples, topics, and figures discussed in class were adopted from this book; hence, it is highly recommended to read through the chapters in this book.
- Code examples and figures are freely available online under an open source license at https://github.com/rasbt/python-machine-learning-book-3rd-edition.
Python Resources
Regarding Python, we will mainly focus on two libraries: NumPy and Scikit-learn. You can think of NumPy as a linear algebra library that provides utilities similar to MatLab (if you are familiar with MatLab). It’s a library that is used in almost any scientific computing task and other libraries in Python and is generally useful. Scikit-learn is the main machine learning library we will be using.
In any case, you don’t need to be an expert Python programmer to use these libraries (and I will teach you about Scikit-learn in this course, so no worries about learning it beforehand). However, some basic familiarity with Python will be necessary in order to use these libraries.
Python for Beginners (Video Lectures)
A great video series by educators at Microsoft, which was recently made available for free on YouTube: https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6.
Learn Python (Interactive Tutorials)
On https://www.learnpython.org/, you can find a interactive exercises that help you learn Python through a sequence of coding exercises.
Illustrated Guide to Python (Book)
- “Illustrated Guide to Python 3: A Complete Walkthrough of Beginning Python with Unique Illustrations Showing how Python Really Works. Now covering Python 3.6 (Treading on Python) (Volume 1)” by Matt Harrison, ISBN-13: 978-1977921758.
For instance, another great book is Allen Downey’s Think Python 2e (free PDF available at https://greenteapress.com/wp/think-python-2e/).
Python Like You Mean It
A short, free intro for getting started with Python and its main scientific computing libraries: https://www.pythonlikeyoumeanit.com.
Grading
The final grade will be computed using the following weighted grading scheme:
- 30% Problem Sets (Homeworks and quizzes)
- 20% Midterm Exam
- 50% Class Project:
- 5% Project proposal
- 20% Project presentation
- 25% Project report
The final letter grade will be based on the percent of the total points accumulated in the course. The proposed grade cut-offs are as follows:
- A: >= 93%
- AB: >= 90%
- B: >= 85%
- BC: >= 80%
- C: >= 70%
- D: >= 50%
- F: < 50%
However, please note that the grades are subject to curving and the cut-offs may be adjusted such that a grade distribution similar to previous semester can be achieved.
Exams
The midterm exam will be the only exam in this class. There is no final exam. The midterm exam will take place during a regular lecture day in the lecture room on Thursday, October 21st.
Class Project
The goal of working on a class project is three-fold. First, it will provide you with the opportunity to apply the concepts learned in this class creatively, which helps you with understanding material more deeply. Second, designing and working on a unique project in a team which is something that you will encounter, if you haven’t already, rather sooner than later in life, and this course project helps with preparing for that. Third, along with the opportunity to practice and the satisfaction of working creatively, students can use this project to enhance their portfolio or resume.
The project consists of 3 parts:
- a project proposal,
- a short project presentation,
- and a project report.
For the class project, you will be working in teams of three.
The project proposal is a short 2-3 page report outlining your plans; it will be due on March 1st. The goal of this proposal is to share your plans with me (the instructor) so that I can provide constructive feedback.
The project presentation will be a (pre-recorded) oral presentation (approx. 8-10 min) that is due at the end of the semester. The goal of this presentation is to practice summarizing and communicating your project to an audience.
The final report is a 8-page report in a conference paper format. It will be submitted at the end of the semester. The purpose of this report is to provide a comprehensive and professional description of your project.
As part of this project experience, each student will also be peer-reviewing 3 talks and 3 reports. More details about the report format and procedures will be shared later in this semester. Examples will be provided.
Other Important Course Information
Late Submission Policy
Homework, quizzes, and projects that are submitted late will
- Submitted within 6 hours of the deadline: 10% deduction from the maximum possible points.
- Submitted within 6 and 24 hours of the deadline: 20% deduction from the maximum possible points.
- Submitted more than 24 hours late: No points.
Rules, Rights & Responsibilities
See the Guides’s Rules, Rights and Responsibilities
Academic Integrity
By enrolling in this course, each student assumes the responsibilities of an active participant in UW-Madison’s community of scholars in which everyone’s academic work and behavior are held to the highest academic integrity standards. Academic misconduct compromises the integrity of the university. Cheating, fabrication, plagiarism, unauthorized collaboration, and helping others commit these acts are examples of academic misconduct, which can result in disciplinary action. This includes but is not limited to failure on the assignment/course, disciplinary probation, or suspension. Substantial or repeated cases of misconduct will be forwarded to the Office of Student Conduct & Community Standards for additional review. For more information, refer to studentconduct.wiscweb.wisc.edu/academic-integrity/.
Accommodations for Students with Disabilities
McBurney Disability Resource Center syllabus statement: “The University of Wisconsin-Madison supports the right of all enrolled students to a full and equal educational opportunity. The Americans with Disabilities Act (ADA), Wisconsin State Statute (36.12), and UW-Madison policy (Faculty Document 1071) require that students with disabilities be reasonably accommodated in instruction and campus life. Reasonable accommodations for students with disabilities is a shared faculty and student responsibility. Students are expected to inform faculty [me] of their need for instructional accommodations by the end of the third week of the semester, or as soon as possible after a disability has been incurred or recognized. Faculty [I], will work either directly with the student [you] or in coordination with the McBurney Center to identify and provide reasonable instructional accommodations. Disability information, including instructional accommodations as part of a student’s educational record, is confidential and protected under FERPA.” http://mcburney.wisc.edu/facstaffother/faculty/syllabus.php
Diversity and Inclusion
Institutional statement on diversity: “Diversity is a source of strength, creativity, and innovation for UW-Madison. We value the contributions of each person and respect the profound ways their identity, culture, background, experience, status, abilities, and opinion enrich the university community. We commit ourselves to the pursuit of excellence in teaching, research, outreach, and diversity as inextricably linked goals.
The University of Wisconsin-Madison fulfills its public mission by creating a welcoming and inclusive community for people from every background – people who as students, faculty, and staff serve Wisconsin and the world.” https://diversity.wisc.edu/
Campus Information
Campus set up a website with guidelines about fall instruction and COVID-19 related information, which you can find here: https://teachlearn.provost.wisc.edu/fall-2021-instruction/.