COSC 375: Data Science

Spring 2024

Course Resources
General
  • Syllabus
  • Modern Data Science with R: Our textbook, which provides a broad overview of the field using R as the language of choice.
  • R: A programming language that excels at data analytics.
  • RStudio: An IDE that is built for working in R and data science.
  • DataCamp: An online platform for learning data science - you’ll need to create an account using your Wofford email address.
Data Science
  • Kaggle: A great website for finding and sharing datasets, joining competitions, etc.
  • Regex Crossword: A set of games designed to help you learn and practice using regular expressions.
  • RegExr: An online tool to learn, build, and test regular expressions.
  • Data Science Prep: Get exceptionally good at data science interviews by getting real interview questions in your email inbox.
  • Quarto: An open-source scientific and technical publishing system for creating dynamic content, authoring documents, publishing high-quality articles, reports, presentations, websites, blogs, and books in formats such as HTML, PDF, MS Word, and ePub, and authoring scientific markdown including equations, citations, crossrefs, figure panels, callouts, and advanced layout.
R / RStudio
  • CRAN Package List: A list of the 20,000+ R packages that can be installed using the install.packages() function.
  • Posit Cheatsheet Collection: An excellent collection of printable cheatsheets on various R-based data science tools (RStudio, Quarto, ggplot2, shiny, etc.).
  • Tidyverse: An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
  • ggplot2
  • Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles.
Books
Code Examples
Course Schedule
Week Date Topics Due
1 Feb 5th (Monday)

Course Introduction

For next time: Read the syllabus, download and install R and RStudio, sign up on DataCamp using the invitation link, and start working on your first DataCamp assignment.

Feb 7th (Wednesday)

The R Programming Language / RStudio

For next time: Become familiar with R and RStudio. Your first DataCamp assignment is due Friday night.

Feb 9th (Friday) The R Programming Language / RStudio DC: Introduction to R
2 Feb 12th (Monday) Data Visualization with ggplot2
Feb 14th (Wednesday)

Work Day

Use this time to work on the DataCamp courses.

DC: Intermediate R

Read Chapter 1

Feb 16th (Friday) Data Visualization with ggplot2 DC: Introduction to the Tidyverse
3 Feb 19th (Monday) Data Transformation with dplyr Read Chapter 2
Feb 21st (Wednesday) Data Transformation with dplyr DC: Introduction to Data Visualization with ggplot2
Feb 23rd (Friday) Kaggle DC: Intermediate Data Visualization with ggplot2
4 Feb 26th (Monday)

Visualizing with Color

Note: Exam 1 is due March 6th. The exam is just completing any DataCamp Project that is in R and labeled as “guided”. Submit proof of completing it (such as a screenshot) to Moodle by the due date.

Read Chapter 3
Feb 28th (Wednesday) Strings and Regular Expressions with stringr DC: Data Manipulation with dplyr
Mar 1st (Friday) Strings and Regular Expressions with stringr DC: Joining Data with dplyr
5 Mar 4th (Monday) Strings and Regular Expressions with stringr Read Chapter 4
Mar 6th (Wednesday)

Reproducible Reports with Quarto

Note: Exam 1 is due tonight! Please make sure you have your proof of completion uploaded to Moodle by 11:59PM.

Exam 1: Any “Guided” R Project on DataCamp
Mar 8th (Friday) Reproducible Reports with Quarto DC: Introduction to Importing Data in R
6 Mar 11th (Monday) Dates and Times with lubridate
Mar 13th (Wednesday) Dates and Times with lubridate DC: Introduction to Statistics in R
Mar 15th (Friday) Factors with forcats DC: Data Communication Concepts
7 Mar 18th (Monday) Databases with dbplyr
Mar 20th (Wednesday)

Databases with dbplyr

Note: No class on Friday. Please use this time to catch up (or get a head start) on some DataCamp courses.

DC: Working with Dates and Times in R
Mar 22nd (Friday) No Class (SIGCSE 2024) DC: Cleaning Data in R
8 Mar 25th (Monday) Machine Learning: Decision Trees
Mar 27th (Wednesday) Machine Learning: Kaggle Competitions DC: Inroduction to Writing Functions in R
Mar 29th (Friday) Work Day DC: Exploratory Data Analysis in R
9 Apr 1st (Monday) No Class (Spring Holiday)
Apr 3rd (Wednesday) No Class (Spring Holiday)
Apr 5th (Friday) No Class (Spring Holiday)
10 Apr 8th (Monday) Machine Learning: k-Nearest Neighbors (k-NN)
Apr 10th (Wednesday)

Machine Learning: Random Forests with tidymodels

Note: Exam 2 is due tonight! Please make sure you have your proof of completion uploaded to Moodle by 11:59PM.

Exam 2: Any “Unguided” R Project on DataCamp
Apr 12th (Friday) Machine Learning: Naive Bayes DC: Introduction to Regression in R
11 Apr 15th (Monday) Machine Learning: k-Means
Apr 17th (Wednesday) Shiny DC: Intermediate Regression in R
Apr 19th (Friday) Shiny DC: Supervised Learning in R: Classification
12 Apr 22nd (Monday)

Shiny

Note: A reminder that Exam 3 is due 5/10. Please make sure you have your proof of completion uploaded to Moodle by 11:59PM on that date.

Apr 24th (Wednesday) Time Series DC: Supervised Learning in R: Regression
Apr 26th (Friday) DC: Unsupervised Learning in R
13 Apr 29th (Monday)
May 1st (Wednesday) DC: Sampling in R
May 3rd (Friday) DC: Hypothesis Testing in R
14 May 6th (Monday)
May 8th (Wednesday) DC: Experimental Design in R
May 10th (Friday) Exam 3: Any “Guided” OR “Unguided” R Project on DataCamp
15 May 15th (Wednesday) Final Presentations (11:30AM - 2:00PM)