📋 General
- Syllabus
- Modern Data Science with R: Our textbook, which provides a broad overview of the field using R as the language of choice
- DataCamp: An online platform for learning data science - you’ll need to create an account using your Wofford email address
- Final Presentation Signup Sheet: Use this link to signup for your final presentation
⏰ Upcoming Events
- useR! 2025 Conference: An R conference that will be held at August 8th-10th that will be held at Duke University in Durham, North Carolina (registration opens March 3rd)
- The National Consortium for Data Science (Upcoming Events)
📊 General Data Science
- Data Science Prep: Get exceptionally good at data science interviews by getting real interview questions in your email inbox
- Kaggle: A great website for finding and sharing datasets, joining competitions, etc.
- Regex Crossword: A set of games designed to help you learn and practice using regular expressions.
- RegExr: An online tool to learn, build, and test regular expressions.
- Quarto: An open-source scientific and technical publishing system for creating dynamic content, authoring documents, publishing high-quality articles, reports, presentations, websites, blogs, and books in formats such as HTML, PDF, MS Word, and ePub, and authoring scientific markdown including equations, citations, crossrefs, figure panels, callouts, and advanced layout.
- Visualizing k-Means Clustering: A nifty visualization tool for demonstrating the k-Means algorithm.
- Data Visualization Ethics
🛠️ R/RStudio/IDEs
- R: A programming language that excels at data analytics
- IDEs (Integrated Development Environments)
- RStudio: An IDE that is built for working in R and data science
- Positron: A next-generation data science IDE (it’s not yet finished, and is available in beta only)
- Visual Studio Code (VS Code): An IDE from Microsoft that is generalized to any type of programming language
- CRAN Package List: A list of the 20,000+ R packages that can be installed using the
install.packages()
function - Tidyverse: An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures
- Posit Cheatsheet Collection: An excellent collection of printable cheatsheets on various R-based data science tools (RStudio, Quarto, ggplot2, shiny, etc.)
- ggplot2
- Documentation: A guide and reference on how to use the ggplot2 package in R
- ggplot2 Extensions: Browse extensions that add additional functionality to ggplot2
- Top 50 ggplot2 Visualizations: 50 chart examples with full R code provided
- Color List: A list of color names that you can use in R
- Custom Colors: A useful guide on how to use different color palettes (or your own colors) in ggplot2
- Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles
🖥️ Code Examples
- Regular expressions (regex.R)
- Dates and times (dt.R)
- Databases (db.R)
- Computer vision (cv.R)
- Machine Learning
- Shiny Demos
- Forecasting (forecast.R)
- Geospatial mapping (world_map.R, us_map.R)
- Text Analysis of Macbeth (macbeth.R)
- Earthquake Explorer (earthquake_explorer.R)
- Sentiment Explorer ()
📅 Course Schedule
Week | Date | Topics | Due |
---|---|---|---|
1 | Feb 3rd (Monday) | Course Introduction For next time: Read the syllabus, download and install R and RStudio, sign up on DataCamp using the invitation link above, and start working on your first DataCamp assignment. |
|
Feb 5th (Wednesday) | The R Programming Language / RStudio For next time: Continue learning your way around RStudio, and play around with R. We will continue with more R next time. Finish your first DataCamp course by Friday night (11:59PM). You will usually have 3 data camp courses every two weeks. |
||
Feb 7th (Friday) | The R Programming Language / RStudio For next time: Continue learning your way around RStudio, and play around with R. Your first DataCamp course is due tonight, and your second course is due Wednesday night. They will always be due 11:59PM of their due date. We will move into data visualization next time. |
DC: Introduction to R | |
2 | Feb 10th (Monday) | Data Visualization with ggplot2 For next time: We will continue learning about data visualization using the ggplot2 package. Since we are going to be using several packages in the TidyVerse, I recommend installing the entire Tidyverse by running |
|
Feb 12th (Wednesday) | Data Visualization with ggplot2 For next time: We will continue learning about data visualization using the ggplot2 package. |
DC: Intermediate R Read Chapter 1 |
|
Feb 14th (Friday) | Data Visualization with ggplot2 / Kaggle For next time: We will finish up the introduction to data visualization using ggplot2, and start moving into data wrangling with the dplyr package (which is part of the tidyverse). |
||
3 | Feb 17th (Monday) | Color / Data Transformation with dplyr For next time: We will continue learning about data wrangling using dplyr. Remember to keep up with the DataCamp courses. |
DC: Introduction to the Tidyverse |
Feb 19th (Wednesday) | Data Transformation with dplyr For next time: We will continue learning about data wrangling using dplyr. Your next DataCamp course is due Friday night. |
Read Chapter 2 | |
Feb 21st (Friday) | Data Transformation with dplyr For next time: We will start moving into text data. Your next DataCamp course is due Monday night (due to an error on my part when selecting the due date). Your Wednesday DC course is still due Wednesday. |
DC: Introduction to Data Visualization with ggplot2 | |
4 | Feb 24th (Monday) | Strings and Regular Expressions with stringr For next time: We will continue learning about regular expressions. Your next DataCamp course is due tonight (due to an error on my part when selecting the due date). Your Wednesday DC course is still due Wednesday. |
|
Feb 26th (Wednesday) | Strings and Regular Expressions with stringr For next time: Your next DataCamp course is due tonight, and I’ve already assigned the two for next week. I’ll be at a conference on Thursday and Friday this week, so we will not have class on Friday. Use the time to work on the DataCamp courses. |
DC: Intermediate Data Visualization with ggplot2 | |
Feb 28th (Friday) | SIGCSE 2025 - NO CLASS | ||
5 | Mar 3rd (Monday) | Reproducible Reports with Quarto For next time: Keep up, as always, with the Datacamp courses. Our first “exam” is not really an exam, but a hands-on project. Choose any “real-world project” in R on DataCamp that you have, complete it, and submit proof of your completion of it to Moodle (such as a screenshot) by next Monday. |
DC: Data Manipulation with dplyr |
Mar 5th (Wednesday) | Reproducible Reports with Quarto For next time: We will do more with Quarto. Keep up with everything on DataCamp. |
||
Mar 7th (Friday) | NO CLASS For next time: Finish the DataCamp course for tonight, and make sure to finish a real-world project on DataCamp by Monday night. |
DC: Joining Data with dplyr | |
6 | Mar 10th (Monday) | Dates and Times with lubridate For next time: Complete a real-world DataCamp project by tonight, and submit proof of completion to Moodle. Even a screenshot of the “replay” button for a project is enough to suffice. |
Exam 1: Any Real-world R Project on DataCamp |
Mar 12th (Wednesday) | Dates and Times with lubridate For next time: You have a DataCamp course due tonight. We will finish up the lubridate package next time. |
DC: Introduction to Importing Data in R | |
Mar 14th (Friday) | Dates and Times with lubridate For next time: We will move into learning a little about databases next time. |
||
7 | Mar 17th (Monday) | Databases with dbplyr For next time: You have a DataCamp course due tonight, and one on Friday. |
DC: Introduction to Statistics in R |
Mar 19th (Wednesday) | Databases with dbplyr For next time: Finish the DataCamp course due Friday before break. We will do a somewhat random topic next time, since a few are traveling. |
||
Mar 21st (Friday) | Computer Vision For next time: We will start diving into the world of machine learning when we get back from break. |
DC: Data Communication Concepts | |
8 | Mar 24th (Monday) | SPRING HOLIDAY - NO CLASS | |
Mar 26th (Wednesday) | SPRING HOLIDAY - NO CLASS | ||
Mar 28th (Friday) | SPRING HOLIDAY - NO CLASS | ||
9 | Mar 31st (Monday) | Machine Learning: Decision Trees For next time: We will spend the next two weeks or so on machine learning topics. I have gone ahead and put up your DataCamp courses for the next three weeks for those that want to work ahead. |
|
Apr 2nd (Wednesday) | Machine Learning: Kaggle Competitions For next time: We will continue next time with another machine learning topic. You have a DataCamp course due tonight. |
DC: Working with Dates and Times in R | |
Apr 4th (Friday) | Machine Learning: k-Nearest Neighbors (k-NN) For next time: You have two DataCamp courses due next week. We will move into learning about TidyModels next week. |
||
10 | Apr 7th (Monday) | Machine Learning: Random Forests with tidymodels For next time: You have a DataCamp course due tonight. I have gone ahead and assigned “Exam 2”, which will be exactly like last time: complete any real-world R project on DataCamp, as long as it is different from the one you did last time (and submit proof of completion to Moodle). |
DC: Cleaning Data in R |
Apr 9th (Wednesday) | Machine Learning: Naive Bayes For next time: As a reminder, I have gone ahead and assigned “Exam 2”, which will be exactly like last time: complete any real-world R project on DataCamp, as long as it is different from the one you did last time (and submit proof of completion to Moodle). We will look at an example of unsupervised learning next time. |
||
Apr 11th (Friday) | Machine Learning: k-Means For next time: I’ve gone ahead and assigned the rest of the DataCamp courses for the semester, for those that want to work ahead. We will learn about interactive visualization next week. |
DC: Inroduction to Writing Functions in R | |
11 | Apr 14th (Monday) | Shiny For next time: Note that, along with your other DataCamp courses, you have your “Exam 2” second real-world project due this week. We will continue learning about interactive dashboards next time (using shiny) |
|
Apr 16th (Wednesday) | Shiny For next time: More shiny. Finish your “Exam 2”, and upload your proof of completion to Moodle. |
DC: Exploratory Data Analysis in R | |
Apr 18th (Friday) | Shiny For next time: Finish your “Exam 2”, and upload your proof of completion to Moodle. We will briefly look at forecasting time series next time. |
Exam 2: Any Real-world R Project on DataCamp | |
12 | Apr 21st (Monday) | Time Series For next time: We will spend some time on neural networks. You do not have a DataCamp assignment on Friday. |
DC: Introduction to Regression in R |
Apr 23rd (Wednesday) | Neural Networks For next time: We will do one more day of neural networks. |
||
Apr 25th (Friday) | Neural Networks For next time: You have one DataCamp course next week. I’ve also gone ahead and assigned “Exam 3”, which will be exactly like the last two: complete any real-world R project on DataCamp, as long as it is different from the previous two (and submit proof of completion to Moodle). |
||
13 | Apr 28th (Monday) | Geospatial Mapping For next time: Be working on your final three DataCamp courses, and do not forget about Exam 3 (a DataCamp project). We will look at text analysis next time. |
|
Apr 30th (Wednesday) | Text Analysis For next time: We will take a look at some ethics-based things in data science. |
DC: Supervised Learning in R: Classification | |
May 2nd (Friday) | Ethics and Bad Visualizations For next time: The signup sheet is now posted for the final presentation (under “General” above). Instructions are in the signup sheet, but I’ve decided to go 1-2 per presentation, your choice. |
||
14 | May 5th (Monday) | Earthquake Explorer For next time: Remember to sign up for your final presentation. |
DC: Supervised Learning in R: Regression |
May 7th (Wednesday) | Sentiment Explorer For next time: Remember to sign up for your final presentation. |
Exam 3 | |
May 9th (Friday) | Sentiment Explorer Please have your final presentation topic selected by this date (11:59PM). You will be deducted 15% off your final presentation grade if you miss this deadline. All late DataCamp courses must also be finished by 11:59PM tonight for partial credit. |
DC: Unsupervised Learning in R | |
15 | May 14th (Wednesday) | Final Presentation (11:30PM - 2:00PM) |