๐ General
- Syllabus
- R for Data Science (2nd Edition): Our textbook, which provides a broad overview of the field using R as the language of choice
- DataCamp: An online platform for learning data science - youโll need to create an account using your Wofford email address
โฐ Upcoming Events
- The National Consortium for Data Science (Upcoming Events)
๐ General Data Science
- Data Science Prep: Get exceptionally good at data science interviews by getting real interview questions in your email inbox
- Kaggle: A great website for finding and sharing datasets, joining competitions, etc.
- Regex Crossword: A set of games designed to help you learn and practice using regular expressions.
- RegExr: An online tool to learn, build, and test regular expressions.
- Quarto: An open-source scientific and technical publishing system for creating dynamic content, authoring documents, publishing high-quality articles, reports, presentations, websites, blogs, and books in formats such as HTML, PDF, MS Word, and ePub, and authoring scientific markdown including equations, citations, crossrefs, figure panels, callouts, and advanced layout.
- Visualizing k-Means Clustering: A nifty visualization tool for demonstrating the k-Means algorithm.
๐ ๏ธ R/IDEs
- R: A programming language that excels at data analytics
- IDEs (Integrated Development Environments)
- RStudio: An IDE that is built for working in R and data science
- Positron: A next-generation data science IDE
- Visual Studio Code (VS Code): An IDE from Microsoft that is generalized to any type of programming language
- CRAN Package List: A list of the 20,000+ R packages that can be installed using the
install.packages()function - Posit Cheatsheet Collection: An excellent collection of printable cheatsheets on various R-based data science tools (RStudio, Quarto, ggplot2, shiny, etc.)
- Tidyverse: An opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures
- ggplot2
- Documentation: A guide and reference on how to use the ggplot2 package in R
- ggplot2 Extensions: Browse extensions that add additional functionality to ggplot2
- Top 50 ggplot2 Visualizations: 50 chart examples with full R code provided
- Color List: A list of color names that you can use in R
- Custom Colors: A useful guide on how to use different color palettes (or your own colors) in ggplot2
- Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles
๐ฅ๏ธ Code Examples
- Regular expressions (regex.R)
- Dates and times (dt.R)
- Databases (db.R)
- Computer vision (cv.R)
- Machine Learning
- Shiny Demos
- Forecasting (forecast.R)
- Geospatial mapping (world_map.R, us_map.R)
๐ Course Schedule
| Week | Date | Topics | Due |
|---|---|---|---|
| 1 | Feb 4th (Wednesday) | Course Introduction For next time: Read the syllabus, download and install R and Positron (or RStudio), sign up on DataCamp using the invitation link above, and start working on your first DataCamp assignment. |
|
| Feb 6th (Friday) | The R Programming Language / Positron For next time: Continue learning your way around Positron, and play around with R. We will continue with more R next time. Finish your first DataCamp course by Monday night (11:59PM). You will usually have 3 data camp courses every two weeks. |
||
| 2 | Feb 9th (Monday) | The R Programming Language / Positron For next time: Continue learning your way around Positron, and play around with R. Your first DataCamp course is due tonight, and your second course is due Friday night. They will always be due 11:59PM of their due date. We will move into data visualization next time. |
|
| Feb 11th (Wednesday) | Data Visualization with ggplot2 For next time: We will continue learning about data visualization using the ggplot2 package. Since we are going to be using several packages in the TidyVerse, I recommend installing the entire Tidyverse by running |
||
| Feb 13th (Friday) | Data Visualization with ggplot2 For next time: We will continue learning about data visualization using the ggplot2 package. Donโt forget to keep up with the DataCamp courses. |
|
|
| 3 | Feb 16th (Monday) | Data Visualization with ggplot2 / Kaggle For next time: We will finish up the introduction to data visualization using ggplot2, and start moving into data wrangling with the dplyr package (which is part of the tidyverse). Read Chapter 1 of your textbook. |
|
| Feb 18th (Wednesday) | Color / Data Transformation with dplyr For next time: We will continue learning about data wrangling using dplyr. Start reading Chapter 3 of your textbook. Remember to keep up with the DataCamp courses. |
|
|
| Feb 20th (Friday) | Data Transformation with dplyr For next time: We will continue learning about data wrangling using dplyr. Your next DataCamp course is due Monday night. |
||
| 4 | Feb 23rd (Monday) | Data Transformation with dplyr For next time: We will start moving into text data. Your next DataCamp course is tonight. |
|
| Feb 25th (Wednesday) | Strings and Regular Expressions with stringr For next time: We will continue learning about regular expressions. |
||
| Feb 27th (Friday) | Strings and Regular Expressions with stringr For next time: Your next DataCamp course is due tonight, and Iโve already assigned the one for next week. |
|
|
| 5 | Mar 2nd (Monday) | Dates and Times with lubridate For next time: You have a DataCamp course due Wednesday. We will finish up the lubridate package next time. |
|
| Mar 4th (Wednesday) | Dates and Times with lubridate For next time: You have a DataCamp course due tonight. We will move into a new topic next time (Quarto). |
|
|
| Mar 6th (Friday) | Reproducible Reports with Quarto For next time: We will do more with Quarto. Your next course is due Monday night. |
||
| 6 | Mar 9th (Monday) | Reproducible Reports with Quarto For next time: You have a DataCamp course due tonight. We will move into a new topic next time (databases). |
|
| Mar 11th (Wednesday) | Databases with dbplyr For next time: Keep up on DataCamp courses. We will continue databases next time. |
||
| Mar 13th (Friday) | Databases with dbplyr For next time: Finish the DataCamp course tonight. We will take a brief look at the topic of computer vision next time. |
|
|
| 7 | Mar 16th (Monday) | Computer Vision For next time: We will start diving into the world of machine learning next time. |
|
| Mar 18th (Wednesday) | Machine Learning: Decision Trees For next time: Finish your DataCamp real-world project (Exam 1). You have a DataCamp course due tonight. |
|
|
| Mar 20th (Friday) | Machine Learning: Kaggle Competitions For next time: Finish your DataCamp real-world project (Exam 1) by tonight. We will do more machine learning after break. |
|
|
| 8 | Mar 23rd (Monday) | SPRING HOLIDAY - NO CLASS | |
| Mar 25th (Wednesday) | SPRING HOLIDAY - NO CLASS | ||
| Mar 27th (Friday) | SPRING HOLIDAY - NO CLASS | ||
| 9 | Mar 30th (Monday) | Machine Learning: k-Nearest Neighbors (k-NN) For next time: Your next DataCamp course is due Friday. We will explore tidymodels next time. |
|
| Apr 1st (Wednesday) | Machine Learning: Random Forests with tidymodels For next time: Finish your DataCamp course by Friday night. We will do a few more topics in machine learning. |
||
| Apr 3rd (Friday) | Machine Learning: Naive Bayes For next time: A DataCamp course is due tonight. We will look at one more machine learning model before moving into something new. |
|
|
| 10 | Apr 6th (Monday) | Machine Learning: k-Means For next time: You have one DataCamp course due this week, and Iโll go ahead and put up the two for next week. Iโm also assigning your โExam 2โ, which is just completing another DataCamp project (any real-world project in R). We will start Shiny next time. |
|
| Apr 8th (Wednesday) | Shiny For next time: Finish your DataCamp course for tonight. We will look at more Shiny next time. |
|
|
| Apr 10th (Friday) | Shiny For next time: Note that, along with your other DataCamp courses, you have your โExam 2โ second real-world project due this week. We will continue learning about interactive dashboards next time (using Shiny). |
||
| 11 | Apr 13th (Monday) | Shiny For next time: You have a DataCamp course due tonight, but also remember to complete a second real-world project (which is your โExam 2โ), and upload proof of completion to Moodle by Wednesday night. We will look at time series / forecasting next time. |
|
| Apr 15th (Wednesday) | Time Series / Forecasting For next time: Submit your exam 2 by tonight. We will look at geospatial mapping next time. |
|
|
| Apr 17th (Friday) | Geospatial Mapping For next time: You have a DataCamp course due tonight. |
|
|
| 12 | Apr 20th (Monday) | Text Analysis For next time: You have one DataCamp course this week (due Wednesday night). Iโve gone ahead and put up the two for next week. |
|
| Apr 22nd (Wednesday) |
|
||
| Apr 24th (Friday) | |||
| 13 | Apr 27th (Monday) |
|
|
| Apr 29th (Wednesday) | |||
| May 1st (Friday) |
|
||
| 14 | May 4th (Monday) | ||
| May 6th (Wednesday) | |||
| May 8th (Friday) | |||
| 15 | May 13th (Wednesday) | Final Presentation (11:30PM - 2:00PM) |