this is an r studio file and you need Rstudios to get this lab done and the research paper, these are all the codes for lab1 through 8 except 4. i need 4,9, and 10 done ill attached the videos for each lab.ill attached my paper chuck #1, #2 and #3 to help out with other paper chunks which is The research paper chunks 4,5 and 6, i also attached examples for 4,5and 6 in files section.
lab 4
Your fourth assignment is to:
Submit an R script with:
1. Your name and section number at the top, commented out.
2. The code for accessing your working directory.
3. The code to read in the data file avocado-prices.csv from last week (also available here Download here.) (Note if you are using RStudio in the cloud, you must upload the data first.)
4. The code to find out how many rows and columns are in the data. Give the answer (“__ rows, __ columns”, but fill in numbers for __) in a comment.
5. Code to find out the names of the columns in the data. You have learned three ways to find this out, so pick one. Give the column names in a comment.
6. Code to extract (“pluck”) the region for row 3537. Give the answer in a comment.
7. Code to extract the entire AveragePrice column. No answer necessary in the comment.
(Note: An R script is the .R file you save in your working directory.)
lab 9
For today’s lab assignment, you’re going to practice visualization, and combine it with a few other things you’ve learned.
1) Pick any data set we’ve worked with so far this semester (avocados, the class quiz, etc.)
2) Pick two variables: a continuous dependent variable and an independent variable of any type. Explain which ones they are in your R script in a comment.
3) Run a bivariate linear regression to estimate the relationship between the two variables. Remember that the dependent variable goes on the left!
4) Use the summary command to print the regression output to the console.
lab 10
For today’s lab assignment, I’m going to properly edit the instructions, unlike last week (sorry), and you’re going to practice running a multivariate regression, obtaining the summary of that regression, making a basic coefficient plot, and getting confidence intervals.
1) Using the same data set you used for Lab 9, pick at least one additional independent variable (of any type), and once again identify the sake independent and (continuous) dependent variable as from your last lab. Use the summary function to print the regression output to the console.
2) Re-run the bivariate linear regression from last week.
3) Run a multivariate linear regression to estimate the relationship between the dependent variable and the two independent variables. Remember that the dependent variable goes on the left! Does the coefficient on your independent variable from the bivariate regression change? In what direction and by how much? (You don’t need to calculate how much. I’m just looking for “a lot”, “not so much”, “not at all”, etc. This is somewhat subjective.)
4) Use the summary function to print the regression output to the console.
5) Use the coefplot function (in the library arm) to make a basic coefficient plot of EACH of the two regressions you ran. (Note: You should produce two plots, not one plot containing both.)
6) Use the confint function to print the confidence intervals for all the estimates to the console for EACH of the two regressions you ran.paper chunk #4
Once you have some data, you need to explore and write about that data before you can do any kind of analysis. You should use both data visualization and descriptive statistics to address all the questions below. Please submit your write-up and a single R file that demonstrates how you make your visualizations and calculate your descriptive statistics.
1. Visualize and provide descriptive statistics for your dependent variable.
2. Same as (1), but for your independent variable. If you did an experiment (with randomly assigned treatments), report (but do not visualize, it’s a very boring visualization) how many people received each treatment.
3. Discuss the sample itself. What is the unit of analysis? How was your data collected? If you are collecting data on individuals, is this a random sample or a convenience sample? If you are collecting data on geographic units (countries, states, cities, etc.), which units and years did you collect data for and why? Regardless of the unit of analysis, does it look representative of the broader population? How do you know?
paper chunk #5
Your fifth paper chunk will include a description of the statistical model you’re running, a table that explains which signs your hypothesis predicts for the hypotheses, and the results of a bivariate linear regression.
Your assignment must include:
- A clear explanation of the statistical model you’re running (you’re running ordinary least squares linear regressions — you should include the equation for your linear regression).
- A table or other clear explanation explaining what signs you expect the coefficients for your independent variables to have.
3. You should run the bivariate linear model and include a presentation of the results. Results should be included in a nicely-formatted table. To make the table below with a linear model object called mod, I ran:
install.packages(“stargazer”)
library(stargazer)
stargazer(mod,no.space=T,type=”text”)
4. In addition to the table, you must also interpret the results and explain whether they support or do not support your hypothesis.
Example
Here’s an example. If you want, you can use this as a direct model and just replace my nouns/numbers/equations/table with your nouns/numbers/equations/table:
My primary hypothesis is that there is a positive relationship between how early a student starts their writing assignment and the quality of the assignment. My dependent variable is assignment quality, measured on a scale of 0-100, with 100 being the highest quality and 0 being the lowest quality. My independent variable is how many days before the due date a student begins the assignment. To test this, I first ran a bivariate linear regression, which estimates the equation
quality_i = b_0 + b_1 * daysbefore_i
where i represents each individual student. Since my hypothesis predicts that the more days before the assignment is due a student starts, the higher quality their paper will be, I expect the coefficient b_1 to have a positive sign, indicating a positive relationship.
The results of my bivariate regression are in Table 1.
===============================================
Dependent variable:
—————————
Quality (0-100)
———————————————–
Days before 6.339***
(0.210)
Constant 40.438***
(1.044)
———————————————–
Observations 100
R2 0.903
Adjusted R2 0.902
Residual Std. Error 4.953 (df = 98)
F Statistic 913.139*** (df = 1; 98)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Table 1: Results of bivariate linear regression.
The coefficient on my independent variable (“Days before”) is 6.339. This means that for every one additional day before the assignment a student begins working on it, the model predicts their writing assignment will be 6.339 points higher in quality. As my hypothesis predicted, this coefficient is positive: starting more days before is associated with higher quality writing assignments. The p-value on this coefficient is below 0.05, which means the result is statistically significant, so I expect that the value in the population is also positive and not zero. This supports my hypothesis. The constant is 40.438, which means that a student who starts the assignment 0 days before it is due is expected to have an assignment with quality rating 40.438. The constant is also statistically significant, which means that in the population I expect it is also positive and not zero.
peper chunk #6
Your sixth paper chunk will include a description of the statistical model you’re running and the results of a multivariate linear regression.
Your assignment must include:
- A clear explanation of the statistical model you’re running (you’re running ordinary least squares linear regressions — you should include the equation for your linear regression). This is literally exactly the same as what you did for #5 except you’ll have some additional variables in the regression equation. You can just copy and paste and then add in the extra variables.
- You should run the multivariate linear model and include a presentation of the results. Results should be included in a nicely-formatted table. Again, this is almost exactly the same thing you did for #5, but with more variables.
- In addition to the table, you must also interpret the results and explain whether they support or do not support your hypothesis. Do not just copy and paste for this one because the coefficient on your independent variable probably changed.
#innocent Nwajiaku Pol 3085
#install.packages(“readr”)#install.packages(“dplyr”)library(readr)library(dplyr)dat <- read_csv(“quant-activity.csv”)dat$`Which state do you live in? (Full name, no abbreviations, please)` <- toupper(dat$`Which state do you live in? (Full name, no abbreviations, please)table(dat$`Which state do you live in? (Full name, no abbreviations, please)`)2table(dat$`Which state do you live in? (Full name, no abbreviations, please)`)dat$state2 <- toupper(dat$`Which state do you live in? (Full name, no abbreviations, please)`)table(dat$state2)dat$state_final = recode(dat$state2, “MINNESOTA”=”MN”)table(dat$state_final)dat$state_final = recode(dat$state2, “MN”=”MINNESOTA”, “Minneapolis, Minnesota”=”Minnesota”)table(dat$state_final)#42 people are from Minnesotatable(dat$`Do/did your parents share your political views?`)dat$howmany <- dat$`Do/did your parents share your political views?`table(dat$howmany)dat$howmany_final = recode(dat$howmany, “No”=0, “They’re divided”=1, “Yes”=2)table(dat$howmany_final)#they do not have enough data to record.
#Innocent NWajiaku POL 3085 (001)
table(dat$`Where would you place yourself on an ideological scale from 1 (very conservative) to 10 (very liberal)?`)sort(table(dat$`Where would you place yourself on an ideological scale from 1 (very conservative) to 10 (very liberal)?`))median(dat$`Where would you place yourself on an ideological scale from 1 (very conservative) to 10 (very liberal)?`, na.rm = T)mean(dat$`Where would you place yourself on an ideological scale from 1 (very conservative) to 10 (very liberal)?`, na.rm = T)tapply(dat$`Where would you place yourself on an ideological scale from 1 (very conservative) to 10 (very liberal)?`, dat$state_final, median, na.rm=T)#yes, beacause majority of states share the same value of being very liberal
plotA <- ggplot(dat, aes)library(readr)library(dplyr)dat <- read_csv(“quant-activity.csv”)#install.packages(“ggplot2”)library(ggplot2)names(dat)table(dat$Timestamp)table(dat$`What is your age?`)#continous variable as what is your age and categorical variable as are you a student# box plot tells the relationship between categorical and continous variablepat <- read_csv(“ddrevisited.csv”)table(pat$democracy, pat$year)ggplot(dat,aes(x=’what is your age?’,dat$howmany<-dat$`Regarding the previous question, roughly how many of your friends would you say fall into that category?`dat <- read_csv(“quant-activity.csv”)library(readr)library(dplyr)library(ggplot2)
dat <- read_csv(“quant-activity.csv”) names(dat)mod <- lm(howmany~’what is your age?’, data = dat)names(dat)dat <- read_csv(“quant-activity.csv”)#which party do you most closely allign with continous dependent variable# what is your age independnt variablemod = lm(“Which party do you most closely align with?”~”What is your age?”) data = datsummary(mod)
library(readr)library(dplyr)dat.demo = read_csv(“ddrevisited.csv”)names(dat)names(dat.demo)table(dat$democracy)table(dat$`Do you think someone could accurately guess at your political or ideological positions just by looking at you?`)table(dat$un_region_name)tapply(dat$`Do you think someone could accurately guess at your political or ideological positions just by looking at you?`tapply(dat$cryname,dat$bornyear,table)
lab 8names(dat)#two variable: continous = “what is your age”, dichotomous = “Are you a student”#visualize ggplot(dat,aes(x=”what is your age?”)) + geom_bar()
ggplot(dat,aes(x=”are you a student?”)) + geom_boxplot() ggplot(dat,aes(x=”what is your age?”,”Are you a student?”)) + geom_boxplot()