Statistics Archival Data Research Project

This assignment information has been copied and pasted from my course.

This is a mini-research project in which you will conduct research using some form of archival data—that is, data that is available in the public domain and that does not require asking anyone questions. The data must already have been collected. This project includes justifying why you are asking the question that you are asking, conducting the actual data compilation, statistically analyzing the data, and explaining what you found. These are the basic components of a research project introduction with a hypothesis for each explanatory variable, method, results, and discussion.

Here is a brief summary of what each section of this paper should include.

Introduction – A brief but clear statement of what data you are accessing. Also include why you are accessing it. It could be for personal reasons, and that is fine, but it should also include at least two scholarly references on why it is important to understand this information. These references must come from peer-reviewed scholarly journals and not from the popular press. You must also state your hypothesis about what you expect to find for each explanatory variable. This should include a null hypothesis.

Method – Your method of data collection must involve at least two explanatory variables, and must contain archival data—that is, data that has been collected by others and is available in the public domain, or is data that is easily counted or compiled as it is in the public domain. Where did you find your data? Be sure to give specific URLs, newspaper dates, physical addresses, and anything else relevant. And include the date or dates on which it was collected. Also include how you went about collecting this information: Did you count something? Did you record instances over a period of time in data in the public domain? Did you content analyze a text? Your degree of specificity in this section should be such that someone could replicate what you have done.

Results – Here you are to present the outcome of your data collection. First, provide the actual data in table format. You should also do descriptive statistics on the data. This should include Five number summary Mean, median, mode Standard deviation Include either a chi-square analysis of your data (if it is categorical) or a correlational analysis (if it is continuous). You must show your mathematical calculations here and give the p-value of your findings. Depending on what data you gathered, it might make more sense to do more than one of a test or both tests. Be sure to graph your results in at least one—but preferably more than one—figure, which is/are appropriate to your data.

Discussion – Explain in written text what you showed in the results section. Did it fit your hypothesis? Why or why not? How does it demonstrate and/or contradict the findings of the related scholarly research you found—and why? What were the limitations of your research study? Was it all one sex? Were there too few data points? Was there a cultural bias?

References – Put here the two scholarly sources that you found and talked about in the introduction and the discussion. Use APA or MLA citation style.

There are some major data collection websites that allow you to examine various population questions. These include Statistics Canada and, in the United States, Current Population Survey . You can search for data on many topics and get the information in table form.

Please let me know if you have any questions.

The following has also been copied from my course. Here are some suggestions for data analysis. It is preferable that you make up your own questions for the data set you are using. If you are unsure ask your tutor.

How many likes versus comments on Facebook do three of your female friends get on their most recent post compared to three of your male friends on their most recent post? The two explanatory variables are type of posting (likes versus comments) and gender (female versus male). The response variable would be the number counted in each category of the explanatory variable.

How many advertisements disguised as news stories (that is, they have a note somewhere that says “sponsored”) appear on Yahoo! versus Bing? Count these on weekdays and then on a weekend. The two explanatory variables are type of story (genuine news versus advertising disguised as news) and time of the week (weekdays versus weekends). The response variable would be the number counted in each category of the explanatory variable.

What are the hockey scores of your two favourite teams when they played at home versus when they played away? The two explanatory variables could be type of team (Canadian versus US) and where the games were played (home versus away). The response variable would be the scores in each category of the explanatory variable.

What are relevant video game statistics for your favourite games? The two explanatory variables could be type of game (first-person shooter versus role-playing) and gender of avatar (male versus female). The response variable would be the scores in each category of the explanatory variable.

How many seniors are working at least part time? The two explanatory variables could be gender of worker (male versus female) and type of work (professional versus non-professional). The response variable would be the count of the workers or the annual income in each category.

Scroll to Top