I will send the files needed later
The PAMAP2 Physical Activity Monitoring dataset (from here (Links to an external site.)), also available from Canvas here. It contains data of 18 different physical activities (such as walking, cycling, playing soccer etc) performed by 9 subjects wearing 3 inertial measurement units (IMU) and a heart rate monitor. This data is stored in individual text files per subject. Each row in each file represents one reading and contains 54 attributes (including timestamp, activity ID, heart rate and IMU sensory data). Please see the Report2_data_info.pdf file (also available in the “Assessment Materials” folder in the Files tab) supplied with the dataset for information about the different attributes and how the dataset was collected. Assuming the goal is to develop hardware and/or software which can determine the amount (using start/end times and heart rates) and type of physical activity carried out by an individual, what actionable insights can you derive from the dataset? Specific Requirements You are required to: carry out thorough exploratory data analysis and appropriately handle missing or dirty data; develop and test at least one hypothesis for a relationship between a single pair of attributes; develop and test at least one model which uses multiple attributes to make predictions. For this assessment you must submit a .zip file containing two files – the Jupyter notebook (.ipynb), and a .pdf of the notebook exported. Marking Criteria This coursework will be marked out of 100, based on the following criteria: Overall quality of report: 20 marks This concerns issues such as writing style, organisation of material, clarity of presentation, etc. Your submission should be approximately 4000 words in length (excluding the content of graphs, tables, references and code). You must submit both a Jupyter notebook and a pdf file. The pdf file should be obtained by exporting the Jupyter notebook to pdf. You should also include any other files which are necessary to run your notebook (e.g., images) but not the original data file. You should use a formal writing style. The questions/requirements outlined above should be clearly answered in your work. Use sections and subsections with appropriate and informative headings. You should relate your findings to the overall aim and comment on any other analysis which could be carried out on this or other datasets. Quality of code: 20 marks This is not really a programming assignment. However, all of your code should be included in the submission so that we can verify what you have done. One of the advantages of using Jupyter notebooks is that when we are marking your submission we will be able to run your code in order to establish that it produces the outputs that you describe. Therefore it is important that your code is both clear and correct. You are encouraged to use library methods. You should use functions (or classes) to modularize your code if appropriate. You should use comments where appropriate in your code. The Jupyter notebook that you submit should include all of the Python code you have written or adapted i.e., all code that you have used in your experiments other than library methods. Quality of analysis: 60 marks (20 marks for each of the 3 specific requirements) This concerns the quality of your answers to each of the requirements. Always back up claims with evidence. Always interpret the results of any experiments. The text in your Jupyter notebook should make it possible for another data scientist to replicate your experiments without referring to your code. Experimental methodology should be sound and clearly explained. You should demonstrate a range different methods but you do not need to be exhaustive. Indeed, attempting to be exhaustive will usually lead to lack of clarity and should be avoided. You should justify your choice of methods. Do not just repeat details directly from the lecture notes or other sources. Explanations should be in your own words and contain examples from your experiments. It is important to keep in mind that the assessment for this module is based on the content in this module. When I say “develop and test a model”, I mean using the methods from this module not the other machine learning modules you are currently studying. When I say “test hypotheses” I mean using the methods outlined in this module. Beyond the assessment specific criteria outlined above, the generic marking criteria for Informatics will apply.