University | University of Wollongong (UOW) |
Subject | CSCI312: Big Data Management |
Question 3
Consider the following logical schema, that implements a two-dimensional data cube.
The data cube contains information about the enrolments of subjects performed by the students.
Assume, that the files student . txt, subject . txt, and enrolment. txt contain data consistent with a logical schema of two-dimensional data cube given above. Internal format of each file is a sequence of values separated with the commas (CSV format).
(1)Write a sequence of commands, that load the files into HDFS. A location for the files in HDFS is up to you.
(2) Write HQL statements that create the Hive tabular views of the files student. txt, subject . txt, and enrolment . txt loaded into HDFS.
(3) Write HQL statements to retrieve the following information from the data warehouse. Each correctly implemented statement is worth 1 mark.
(i) Find the total number of enrolments per student, per subject, and per both students and subject and the total number of enrolments. List the values of the attributes: student-number and subject code and the total number of enrolments.
(ii) For each subject and for each year list year (enrolment-date) , subject-code and the scores in a subject in a year ordered in an ascending order of scores, and an average of all scores in a year.
(iii) Find an average score in all subjects per year, and per both subject and year, and per both student and year. You can use the row functions year to extract a year from a date. List the values of the attributes: year (enrolment date) , student number and subject code, and an average score.
(iv) For each student and for each subject list a pair: student-number and subj e c t-code together with an average score of all subjects enrolled by a student.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Question 4
Consider the following logical schema of a relational database, that implements a data cube with historical information related to the subjects enrolled and dropped by the students.
Write HBase shell commands to create a single HBase table, that implements a logical schema given above.
Write HBase commands to load into the table information about at least two subjects, one student, two enrolments and one drop. Please remember, that the students are allowed to enrol and/or drop many subjects and a subject can be enrolled dropped by many students.
Your HBase table must be created in a way, that does not contribute to any data redundancies when information about students, subjects, enrolments and drops is entered into the table.
(2) Write HBase shell commands, that implement the following queries and data manipulations on the HBase table created and loaded with data in the previous step. Each correctly implemented task is worth 1 mark.
(i) Find all information (student number and full name) about the students enrolled in a subject ISIT312.
(ii) Find all information (subject code and title) about a subject ISIT312.
(iii) Add a column family LECTURER and allow for two versions in each cell of the new column family. (iv) Assume that lecturers are described by an employee number and full name. Insert into the table information about a lecturer and about a subject taught by a lecturer. Assume, that a lecturer teaches one subject and each subject is taught by one lecturer.
Question 5
In this question, we use the same logical schema of the two-dimensional data cube as in Question 3.
Assume, that the student of the file. txt, subject. txt, and enrolment. txt contains data consistent with a logical schema of the two-dimensional data cube given above. The internal format of each file is a sequence of values separated by the commas (CSV format).
Assume, that the files have been already loaded to HDFS. Write Pig-Latin statements that implement the following queries. Correct implementation of each query is worth I mark. (1) Find the full names of students who enrolled in a subject with a code 1S1T312.
(2) Find the student numbers of students of the customers who never enrolled in a subject with a code 1S1T312.
(3) Find the student numbers of students who enrolled in both subjects with the codes ISIT312 and CSCI317.
(4) Find the subject codes together with the total number of students enrolled in each subject.
Question 6
In this question, we use the same logical schema of the two-dimensional data cube as in Question 3.
Assume, that the student of the file. txt, subject. txt, and enrolment. txt contains data consistent with a logical schema of the two-dimensional data cube given above. The internal format of each file is a sequence of values separated by the commas (CSV format).
Assume, that the files have been already loaded to HDFS. Implement the following Spark-shell operations. Correct implementation of each operation is worth I mark.
(1) Create the DataFrames, that contain information about students, enrolments and subjects.
(2) Implement a query, that accesses the data frames created in the previous step and finds the total number of enrolments in a subject 1S1T312.
(3) Implement a query, that accesses the data frames created in the previous step and for each student finds the total number of enrolments performed by a student.
(4) Register the DataFrames, which contains information about the students, enrolments and subjects as SQL temporary views.
(5) Use SQL views created in the previous step to find the titles of subjects together with the total number of students enrolled in each subject.
Buy Custom Answer of This Assessment & Raise Your Grades
Singapore Assignment Help presents high-quality computer science assignment help on CSCI312: Big Data Management Assignment. Our experts are well qualified and talented to deliver the best solution on data management assignments at a cheap price.
Looking for Plagiarism free Answers for your college/ university Assignments.
- HRM331: Talent Management – Strategic Shift from the War for Talent to the Wealth of Talent
- Marginalised Populations – The Structural and Cultural Exclusion of People Experiencing Homelessness in Singapore
- CVEN3501 Assignment 2: Groundwater Modelling of Drawdown from a Pumping Bore
- CSCI312 Assignment 2: Conceptual Modelling and Implementation of a Data Warehouse and Hive Queries
- CH2123 Assignnment : Fugacity, VLE Modeling & Henry’s Law Applications
- BAFI1045 Assignment -Constructing and Evaluating Passive and Active Portfolios Based on the Straits Times Index (STI)
- PSB501EN Assignment 1: Engineering Systems Integration: A Multi-Technique Approach to Mechanical Analysis
- FIN2210E/FIN2212E Group Assignment: Financial Risk Management Analysis of Bursa Malaysia Companies
- FLM101 Assignment: A Cinematic Dissection: Stylistic Elements and Their Thematic Significance
- Assignment: Transforming Talent in the AI Era: From War to Wealth through Ecosystem Innovation