Data Science

COSC 375

๐Ÿ“‹ General

  • Syllabus
  • R for Data Science (2nd Edition): Our textbook, which provides a broad overview of the field using R as the language of choice
  • DataCamp: An online platform for learning data science - youโ€™ll need to create an account using your Wofford email address

โฐ Upcoming Events

๐Ÿ“Š General Data Science

  • Data Science Prep: Get exceptionally good at data science interviews by getting real interview questions in your email inbox
  • Kaggle: A great website for finding and sharing datasets, joining competitions, etc.
  • Regex Crossword: A set of games designed to help you learn and practice using regular expressions.
  • RegExr: An online tool to learn, build, and test regular expressions.
  • Quarto: An open-source scientific and technical publishing system for creating dynamic content, authoring documents, publishing high-quality articles, reports, presentations, websites, blogs, and books in formats such as HTML, PDF, MS Word, and ePub, and authoring scientific markdown including equations, citations, crossrefs, figure panels, callouts, and advanced layout.
  • Visualizing k-Means Clustering: A nifty visualization tool for demonstrating the k-Means algorithm.

๐Ÿ› ๏ธ R/IDEs

  • R: A programming language that excels at data analytics
  • IDEs (Integrated Development Environments)
    • RStudio: An IDE that is built for working in R and data science
    • Positron: A next-generation data science IDE
    • Visual Studio Code (VS Code): An IDE from Microsoft that is generalized to any type of programming language
  • CRAN Package List: A list of the 20,000+ R packages that can be installed using the install.packages() function
  • Posit Cheatsheet Collection: An excellent collection of printable cheatsheets on various R-based data science tools (RStudio, Quarto, ggplot2, shiny, etc.)
  • Tidyverse: An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures
  • ggplot2
  • Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles

๐Ÿ–ฅ๏ธ Code Examples

๐Ÿ“… Course Schedule

Week Date Topics Due
1 Feb 4th (Wednesday)

Course Introduction

For next time: Read the syllabus, download and install R and Positron (or RStudio), sign up on DataCamp using the invitation link above, and start working on your first DataCamp assignment.

Feb 6th (Friday)

The R Programming Language / Positron

For next time: Continue learning your way around Positron, and play around with R. We will continue with more R next time. Finish your first DataCamp course by Monday night (11:59PM). You will usually have 3 data camp courses every two weeks.

2 Feb 9th (Monday)

The R Programming Language / Positron

For next time: Continue learning your way around Positron, and play around with R. Your first DataCamp course is due tonight, and your second course is due Friday night. They will always be due 11:59PM of their due date. We will move into data visualization next time.

  • DC (Introduction to R)
Feb 11th (Wednesday)

Data Visualization with ggplot2

For next time: We will continue learning about data visualization using the ggplot2 package. Since we are going to be using several packages in the TidyVerse, I recommend installing the entire Tidyverse by running install.packages("tidyverse"). Start reading Chapters 1 and 2 of your textbook.

Feb 13th (Friday)

Data Visualization with ggplot2

For next time: We will continue learning about data visualization using the ggplot2 package. Donโ€™t forget to keep up with the DataCamp courses.

  • DC: Intermediate R
3 Feb 16th (Monday)

Data Visualization with ggplot2 / Kaggle

For next time: We will finish up the introduction to data visualization using ggplot2, and start moving into data wrangling with the dplyr package (which is part of the tidyverse). Read Chapter 1 of your textbook.

Feb 18th (Wednesday)

Color / Data Transformation with dplyr

For next time: We will continue learning about data wrangling using dplyr. Start reading Chapter 3 of your textbook. Remember to keep up with the DataCamp courses.

  • DC: Introduction to the Tidyverse
  • Read Chapters 1 and 2
Feb 20th (Friday)

Data Transformation with dplyr

For next time: We will continue learning about data wrangling using dplyr. Your next DataCamp course is due Monday night.

4 Feb 23rd (Monday)

Data Transformation with dplyr

For next time: We will start moving into text data. Your next DataCamp course is tonight.

  • DC: Introduction to Data Visualization with ggplot2
Feb 25th (Wednesday)

Strings and Regular Expressions with stringr

For next time: We will continue learning about regular expressions.

Feb 27th (Friday)

Strings and Regular Expressions with stringr

For next time: Your next DataCamp course is due tonight, and Iโ€™ve already assigned the one for next week.

  • DC: Intermediate Data Visualization with ggplot2
5 Mar 2nd (Monday)

Dates and Times with lubridate

For next time: You have a DataCamp course due Wednesday. We will finish up the lubridate package next time.

Mar 4th (Wednesday)

Dates and Times with lubridate

For next time: You have a DataCamp course due tonight. We will move into a new topic next time (Quarto).

  • DC: Data Manipulation with dplyr
Mar 6th (Friday)

Reproducible Reports with Quarto

For next time: We will do more with Quarto. Your next course is due Monday night.

6 Mar 9th (Monday)

Reproducible Reports with Quarto

For next time: You have a DataCamp course due tonight. We will move into a new topic next time (databases).

  • DC: Joining Data with dplyr
Mar 11th (Wednesday)

Databases with dbplyr

For next time: Keep up on DataCamp courses. We will continue databases next time.

Mar 13th (Friday)

Databases with dbplyr

For next time: Finish the DataCamp course tonight. We will take a brief look at the topic of computer vision next time.

  • DC: Introduction to Importing Data in R
7 Mar 16th (Monday)

Computer Vision

For next time: We will start diving into the world of machine learning next time.

Mar 18th (Wednesday)

Machine Learning: Decision Trees

For next time: Finish your DataCamp real-world project (Exam 1). You have a DataCamp course due tonight.

  • DC: Introduction to Statistics in R
Mar 20th (Friday)

Machine Learning: Kaggle Competitions

For next time: Finish your DataCamp real-world project (Exam 1) by tonight. We will do more machine learning after break.

  • Exam 1: Any Real-world R Project on DataCamp
8 Mar 23rd (Monday) SPRING HOLIDAY - NO CLASS
Mar 25th (Wednesday) SPRING HOLIDAY - NO CLASS
Mar 27th (Friday) SPRING HOLIDAY - NO CLASS
9 Mar 30th (Monday)

Machine Learning: k-Nearest Neighbors (k-NN)

For next time: Your next DataCamp course is due Friday. We will explore tidymodels next time.

Apr 1st (Wednesday)

Machine Learning: Random Forests with tidymodels

For next time: Finish your DataCamp course by Friday night. We will do a few more topics in machine learning.

Apr 3rd (Friday)

Machine Learning: Naive Bayes

For next time: A DataCamp course is due tonight. We will look at one more machine learning model before moving into something new.

  • DC: Data Communication Concepts
10 Apr 6th (Monday)

Machine Learning: k-Means

For next time: You have one DataCamp course due this week, and Iโ€™ll go ahead and put up the two for next week. Iโ€™m also assigning your โ€œExam 2โ€, which is just completing another DataCamp project (any real-world project in R). We will start Shiny next time.

Apr 8th (Wednesday)

Shiny

For next time: Finish your DataCamp course for tonight. We will look at more Shiny next time.

  • DC: Working with Dates and Times in R
Apr 10th (Friday)

Shiny

For next time: Note that, along with your other DataCamp courses, you have your โ€œExam 2โ€ second real-world project due this week. We will continue learning about interactive dashboards next time (using Shiny).

11 Apr 13th (Monday)

Shiny

For next time: You have a DataCamp course due tonight, but also remember to complete a second real-world project (which is your โ€œExam 2โ€), and upload proof of completion to Moodle by Wednesday night. We will look at time series / forecasting next time.

  • DC: Cleaning Data in R
Apr 15th (Wednesday)

Time Series / Forecasting

For next time: Submit your exam 2 by tonight. We will look at geospatial mapping next time.

  • Exam 2: Another Real-world R Project on DataCamp
Apr 17th (Friday)

Geospatial Mapping

For next time: You have a DataCamp course due tonight.

  • DC: Inroduction to Writing Functions in R
12 Apr 20th (Monday)

Text Analysis

For next time: You have one DataCamp course this week (due Wednesday night). Iโ€™ve gone ahead and put up the two for next week.

Apr 22nd (Wednesday)
  • DC: Exploratory Data Analysis in R
Apr 24th (Friday)
13 Apr 27th (Monday)
  • DC: Introduction to Regression in R
Apr 29th (Wednesday)
May 1st (Friday)
  • DC: Supervised Learning in R: Classification
14 May 4th (Monday)
May 6th (Wednesday)
May 8th (Friday)
15 May 13th (Wednesday) Final Presentation (11:30PM - 2:00PM)