COSC 375: Data Science

Spring 2025

Course Resources
General
  • Syllabus
  • Modern Data Science with R: Our textbook, which provides a broad overview of the field using R as the language of choice
  • DataCamp: An online platform for learning data science - you’ll need to create an account using your Wofford email address
Upcoming Events
  • useR! 2025 Conference: An R conference that will be held at August 8th-10th that will be held at Duke University in Durham, North Carolina (registration opens March 3rd)
  • The National Consortium for Data Science (Upcoming Events)
General Data Science
  • Data Science Prep: Get exceptionally good at data science interviews by getting real interview questions in your email inbox
  • Kaggle: A great website for finding and sharing datasets, joining competitions, etc.
  • Regex Crossword: A set of games designed to help you learn and practice using regular expressions.
  • RegExr: An online tool to learn, build, and test regular expressions.
  • Quarto: An open-source scientific and technical publishing system for creating dynamic content, authoring documents, publishing high-quality articles, reports, presentations, websites, blogs, and books in formats such as HTML, PDF, MS Word, and ePub, and authoring scientific markdown including equations, citations, crossrefs, figure panels, callouts, and advanced layout.
R / RStudio / IDEs
  • R: A programming language that excels at data analytics
  • IDEs (Integrated Development Environments)
    • RStudio: An IDE that is built for working in R and data science
    • Positron: A next-generation data science IDE (it’s not yet finished, and is available in beta only)
    • Visual Studio Code (VS Code): An IDE from Microsoft that is generalized to any type of programming language
  • CRAN Package List: A list of the 20,000+ R packages that can be installed using the install.packages() function
  • Tidyverse: An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures
  • Posit Cheatsheet Collection: An excellent collection of printable cheatsheets on various R-based data science tools (RStudio, Quarto, ggplot2, shiny, etc.)
  • ggplot2
  • Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles
Books
Code Examples



Course Schedule
Week Date Topics Due
1 Feb 3rd (Monday)

Course Introduction

For next time: Read the syllabus, download and install R and RStudio, sign up on DataCamp using the invitation link above, and start working on your first DataCamp assignment.

Feb 5th (Wednesday)

The R Programming Language / RStudio

For next time: Continue learning your way around RStudio, and play around with R. We will continue with more R next time. Finish your first DataCamp course by Friday night (11:59PM). You will usually have 3 data camp courses every two weeks.

Feb 7th (Friday)

The R Programming Language / RStudio

For next time: Continue learning your way around RStudio, and play around with R. Your first DataCamp course is due tonight, and your second course is due Wednesday night. They will always be due 11:59PM of their due date. We will move into data visualization next time.

DC: Introduction to R
2 Feb 10th (Monday)

Data Visualization with ggplot2

For next time: We will continue learning about data visualization using the ggplot2 package. Since we are going to be using several packages in the TidyVerse, I recommend installing the entire Tidyverse by running install.packages("tidyverse").

Feb 12th (Wednesday)

Data Visualization with ggplot2

For next time: We will continue learning about data visualization using the ggplot2 package.

DC: Intermediate R

Read Chapter 1

Feb 14th (Friday)

Data Visualization with ggplot2 / Kaggle

For next time: We will finish up the introduction to data visualization using ggplot2, and start moving into data wrangling with the dplyr package (which is part of the tidyverse).

3 Feb 17th (Monday)

Color / Data Transformation with dplyr

For next time: We will continue learning about data wrangling using dplyr. Remember to keep up with the DataCamp courses.

DC: Introduction to the Tidyverse
Feb 19th (Wednesday)

Data Transformation with dplyr

For next time: We will continue learning about data wrangling using dplyr. Your next DataCamp course is due Friday night.

Read Chapter 2
Feb 21st (Friday)

Data Transformation with dplyr

For next time: We will start moving into text data. Your next DataCamp course is due Monday night (due to an error on my part when selecting the due date). Your Wednesday DC course is still due Wednesday.

DC: Introduction to Data Visualization with ggplot2
4 Feb 24th (Monday)

Strings and Regular Expressions with stringr

For next time: We will continue learning about regular expressions. Your next DataCamp course is due tonight (due to an error on my part when selecting the due date). Your Wednesday DC course is still due Wednesday.

Feb 26th (Wednesday)

Strings and Regular Expressions with stringr

For next time: Your next DataCamp course is due tonight, and I’ve already assigned the two for next week. I’ll be at a conference on Thursday and Friday this week, so we will not have class on Friday. Use the time to work on the DataCamp courses.

DC: Intermediate Data Visualization with ggplot2
Feb 28th (Friday) SIGCSE 2025 - NO CLASS
5 Mar 3rd (Monday)

Reproducible Reports with Quarto

For next time: Keep up, as always, with the Datacamp courses. Our first “exam” is not really an exam, but a hands-on project. Choose any “real-world project” in R on DataCamp that you have, complete it, and submit proof of your completion of it to Moodle (such as a screenshot) by next Monday.

DC: Data Manipulation with dplyr
Mar 5th (Wednesday)

Reproducible Reports with Quarto

For next time: We will do more with Quarto. Keep up with everything on DataCamp.

Mar 7th (Friday)

NO CLASS

For next time: Finish the DataCamp course for tonight, and make sure to finish a real-world project on DataCamp by Monday night.

DC: Joining Data with dplyr
6 Mar 10th (Monday)

Dates and Times with lubridate

For next time: Complete a real-world DataCamp project by tonight, and submit proof of completion to Moodle. Even a screenshot of the “replay” button for a project is enough to suffice.

Exam 1: Any Real-world R Project on DataCamp
Mar 12th (Wednesday)

Dates and Times with lubridate

For next time: You have a DataCamp course due tonight. We will finish up the lubridate package next time.

DC: Introduction to Importing Data in R
Mar 14th (Friday)

Dates and Times with lubridate

For next time: We will move into learning a little about databases next time.

7 Mar 17th (Monday)

Databases with dbplyr

For next time: You have a DataCamp course due tonight, and one on Friday.

DC: Introduction to Statistics in R
Mar 19th (Wednesday)

Databases with dbplyr

For next time: Finish the DataCamp course due Friday before break. We will do a somewhat random topic next time, since a few are traveling.

Mar 21st (Friday)

Computer Vision

For next time: We will start diving into the world of machine learning when we get back from break.

DC: Data Communication Concepts
8 Mar 24th (Monday) SPRING HOLIDAY - NO CLASS
Mar 26th (Wednesday) SPRING HOLIDAY - NO CLASS
Mar 28th (Friday) SPRING HOLIDAY - NO CLASS
9 Mar 31st (Monday)

Machine Learning: Decision Trees

For next time: We will spend the next two weeks or so on machine learning topics. I have gone ahead and put up your DataCamp courses for the next three weeks for those that want to work ahead.

Apr 2nd (Wednesday) Machine Learning: Kaggle Competitions DC: Working with Dates and Times in R
Apr 4th (Friday)
10 Apr 7th (Monday) DC: Cleaning Data in R
Apr 9th (Wednesday)
Apr 11th (Friday) DC: Inroduction to Writing Functions in R
11 Apr 14th (Monday)
Apr 16th (Wednesday) DC: Exploratory Data Analysis in R
Apr 18th (Friday)
12 Apr 21st (Monday)
Apr 23rd (Wednesday)
Apr 25th (Friday)
13 Apr 28th (Monday)
Apr 30th (Wednesday)
May 2nd (Friday)
14 May 5th (Monday)
May 7th (Wednesday)
May 9th (Friday)
15 May 14th (Wednesday) Final Exam (11:30PM - 2:00PM)