Assignments
See schedule for the due date. The due time is 11:55pm on the due date. Submit all assignments on Sakai. For work completed in R and RStudio, submit two files: rmarkdown and html. For work completed in MySQL, submit either a png of the data model or a pdf of SQL code including results.
Individual
id | topic | description | points |
A1 | linear model | use the Amazon data to build a linear model of your choice in R | 8 | A2 | web scraping | pick a website or API and scrape data using R | 8 |
A3 | rmarkdown | integrate your work from the Amazon case into one Rmd report | 8 |
A4 | data modeling | sketch a data model for the Academic Success Center, a not-for-profit center located in Winston-Salem, NC | 8 |
A5 | sql | write 8 queries to answer questions from the Classical Models database | 8 |
Extra credit opportunity: Use PUTTY to access the DEAC cluster (either pegasus or gemini) and successfully submit a job with R (by Dec 9 at 11:55pm) for 1 extra credit point added to your final grade. Once you receive the completion e-mail, check your output to make sure the output is not unexpected. Upload three files to Sakai: your R script, the output file, and the completion e-mail from HPC as a pdf. Note. You can use the tweets_2014-01-23_02-11_156285.csv data but you are not allowed to use the example R script in the “HPC on the DEAC Cluster” video.
Group
A state-of-the-art presentation is required from each group on a R package not covered in class, with a particular concentration on data science.
- You will give a 10-15 minute class presentation to your fellow class members
- Points will be deducted for exceeding 15 minutes
- Focus on the applications of the package and the opportunities it provides
- E-mail the instructor your code and slides 2 hours before your presentation
- neuralnet
- sparklyr (Teams 8 & 17)
- janitor (Teams 10 & 15)
- mice (Team 16)
- arules (Team 9)
- shiny (Teams 1 & 14)
- SentimentAnalysis (Teams 3 & 19)
- randomForest (Teams 6 & 12)
- caret (Teams 7 & 18)
- tensorflow (Team 13)
- CORElearn
- esquisse (Team 2)
- mlr (Team 20)
- quanteda (Teams 4 & 11)
- Rcrawler
When submitting the database project, you should provide the following:
- Your team name and the names of its members
- A one paragraph description of the database
- The data model in png format (File > Export > Export as PNG…)
- 10 queries
- A natural language description of the query
- The SQL and the results (Query -> Execute (All or Selection) to Text)
- Copy and paste the results into Word so that everything is in one document
- The 10 queries should cover the following SQL features:
- multiple table join
- subquery
- correlated subquery
- GROUP BY and GROUP BY with HAVING
- ORDER BY
- divide
- IN or NOT IN
- A built-in function (e.g., AVG) or a calculated field
- REGEXP
- EXISTS or NOT EXISTS, other than divide
query 1 | query 2 | ... | |
multiple table join | X | subquery | X | X |
... | X |
- Submit a PDF report to the assignment dropbox on Sakai as the attachment.