University | University of Wollongong (UOW) |
Subject | CSCI312: Big Data Management |
Assignment 2
Scope
The objectives of Assignment 2 include conceptual modelling of a data warehouse, implementation of 0NF tables in HQL, implementation of external tables in HQL, and querying a data cube. This assignment is due on Sunday, 11 May 2025, 8:00 pm (sharp) Singaporean Time (SGT). This assignment is worth 20% of the total evaluation of the subject. The assignment consists of 4 tasks and the specification of each task starts from a new page.
Only electronic submission through Moodle at: https://moodle.uowplatform.edu.au/login/index.php will be accepted. A submission procedure is explained at the end of the Assignment 2 specification.
A policy regarding late submissions is included in the subject outline. Only one submission of Assignment 2 is allowed and only one submission per student is accepted. A late submission penalty (5% of the total mark) will be applied for every 24 hours late. A submission that contains an incorrect file attached is treated as a correct submission with all consequences coming from the evaluation of the file attached.
All files left on Moodle in a state “Draft (not submitted)” will NOT be evaluated. An implementation that does not compile well due to one or more syntactical and/or run time errors scores no marks.
The second assignment is an individual assignment, and it is expected that all its tasks will be solved individually without any cooperation with the other students. However, it is allowed to declare in the submission comments that a particular component or task of this assignment has been implemented in cooperation with another student. In such a case evaluation of a task or component may be shared with another student. In all other cases plagiarism will result in a FAIL grade being recorded for entire assignment. If you have any doubts, questions, etc. please consult your lecturer or tutor during laboratory/tutorial classes or over e-mail.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Task 1 (5 marks)
Intuitive design of a data cube from a conceptual schema of an operational database
Consider the following conceptual schema of an operational database owned by a multinational real estate company. The database contains information about the real estate properties offered for sale, owners of the properties, potential buyers who are interested in the properties and real estate agents involved in the selling of the properties.
Whenever a property is put on the market by an owner, a description of the property is entered into an operational database. Whenever a property is purchased, its description is removed from an operational database.
The real estate company would like to create a data warehouse to keep information about the finalized real estate transactions, properties involved in the transactions, sellers/owners, and agents involved in the real estate transactions. The real estate company would like to use a data warehouse to implement the following classes of analytical applications.
- Find the total number of real estate properties sold per month, year, street, city, country, and agent involved.
- Find an average asked price of real estate properties sold per month, year, street, city, country, and agent involved.
- Find an average final price of real estate properties sold per month, year, street, city, country, and agent involved.
- Find an average period of time on the market of real estate properties sold per month, year, street, city, country, and agent involved.
- Find the total number of times each real estate property has been sold in a given period of time.
- Find the total number of buyers interested in purchases of real estate properties sold per day, month, year, street, city, country, and agent involved.
Note, the operational database does not contain all information necessary to implement the classes of applications listed above. Additional information must be added when data is transferred from an operational database to a data warehouse.
- Using the short explanation of the database domain and the conceptual schema provided above, identify a suitable data cube that the multinational real estate company could implement for its data warehouse. In your specification, list the facts, measures, dimension names, and their corresponding hierarchies. For each measure, include an explanation of how it is derived.
- Pick any three dimensions from a data cube found in the previous step and at least 4 values in each dimension and one measure to draw a sample three-dimensional data cube in a perspective view similar to a view included in a presentation 09 Data Warehouse Concepts, slide 6.
Deliverables
A file solution1.pdf that contains
- a specification of data cube as a list of facts, measures, dimensions, and hierarchies obtained as a result of task (1),
- a perspective drawing of a three-dimensional data cube as a result of task (2).
Task 2 (5 marks)
Conceptual modelling of a data warehouse
An objective of this task is to create a conceptual schema of a sample data warehouse domain described below. Read and analyse the following specification of a data warehouse domain.
A person is represented as either a patient or a medical worker or an administration worker. Medical and administration workers work in medical facilities that have a name, address, and possibly (not obligatory) specialization. Each medical worker is described as a unique staff number at a facility, name, address, and phone number.
A patient visits a medical facility for service of a health problem. Each service involves a patient, a medical worker, and an administration worker. The service can be a diagnosis, treatment, or checkup. A description and date of each service are recorded. Time spent on service and the costs are recorded as well.
A patient is eligible for his or her company health care benefits. Patient data includes name, id number (social security number), address (street, city, state, zip), and phone.
A medical worker must hold one or more credentials that are granted to work in a particular medical facility. Doctors are allowed to deliver diagnosis and give treatment based on their specialization. Paramedics are allowed to deliver only emergency diagnosis and treatment for any type of life-threatening problems. Nurses do not deliver diagnosis, but they do participate in treatment, particularly if the patient must be prepared for surgery or remain at the facility overnight.
The administration workers are concerned with personnel needs and assignments. Each medical worker must have at most one assignment at a facility. Several administration workers can be assigned to one assignment.
Medical facilities are located in different suburbs of different cities. A medical facility is uniquely identified by an address.
A data warehouse must be designed such it should be possible to easily implement the following classes of applications.
- The management of the medical facilities would like to get from a data warehouse information about the total number of medical services performed per medical facility, per year, per month per day, per city and per medical worker,
- the total length of medical services per medical facility, per year, per month per day, per city and per medical worker,
- the average length of medical services per medical facility, per year, per month per day, per city and per medical worker,
- the total number of doctors/paramedics/nurses involved in medical services, per year, per month per day, per medical facility, per city,
- the average time spent on medical services per year, month, day,
- the total costs of medical services per year, month, day, medical facility, and city.
To draw a conceptual schema, use the graphical notation introduced in Presentation 11: Conceptual Data Warehouse Design.
Follow the steps below to create a conceptual schema for a sample data warehouse domain:
- Identify a fact entity and the measures that describe it.
- Identify the dimensions.
- Identify hierarchies within each dimension.
- Identify the descriptions (attributes) of all entity types.
- Draw the conceptual schema.
You can use the UMLet diagram drawing tool and select the Conceptual Modelling notation. The notation selection is available in the top-right corner of UMLet’s main menu. UMLet version 14.3 can be downloaded from the subject’s Moodle website under the WEB LINKS section. Alternatively, a neat hand-drawn diagram is also acceptable.
Deliverables
A file solution2.pdf with a drawing of a conceptual schema of a sample data warehouse domain.
Buy Custom Answer of This Assessment & Raise Your Grades
Task 3 (5 marks)
Implementation of a table with a complex column type (0NF table) in Hive
Assume that we have a collection of semi-structured data with information about the employees (unique employee number and full name), the projects they are assigned to (project name and percentage of involvement) and their programming skills (the names of known programming languages). Some of the employees are on leave and they are not involved in any project. Also, some of the employees do not know any programming languages.
Few sample records from the collection are listed below.
007|James Bond|DB/3:30,Oracle:25,SQL-2022:100|Java,C,C++ 008|Harry Potter|DB/3:70,Oracle:75| 010|Robin Banks| |C,Rust 009|Robin Hood| |
- Implement HQL script solution3.hql that creates an internal relational table to store information about the employees, the projects they are assigned to (project name and percentage of involvement) and their programming skills.
- Include into the script INSERT statements that load sample data into the table. Insert at least 5 rows into the relational table created in the previous step. Two employees must participate in few projects and must know few programming languages. One employee must participate in few projects and must not know any programming languages. One employee must know few programming languages and must not participate in any projects. One employee must not know programming languages and must not participate in the projects.
- Include into the script SELECT statements that list the contents of the table. When ready, use a command line interface beeline to process a script solution3.hql and to save a report from processing in a file solution3.rpt. If the processing of the file returns the errors then you must eliminate the errors!
Deliverables
A file solution3.rpt with a report from the processing of HQL script solution3.hql. The report MUST NOT include any errors, and the report must list all SQL statements processed.
Task 4 (5 marks)
Implementation of a data warehouse as a collection of external tables in Hive
Consider the following two-dimensional data cube.
The data cube contains information about that parts that can be shipped by the suppliers. Download and unzip a file task4.zip. You should obtain a folder task4 with the following files: part.tbl, supplier.tbl, partsupp.tbl.
Use an editor to examine the contents of *.tbl files. Note, that the contents of the files can be loaded into the relational tables obtained from the transformation of the two-dimensional data cube given above into the relational table PART, SUPPLIER, and PARTSUPP.
Transfer the files into HDFS.
Implement HQL script solution4.hql that creates the external tables obtained from a step of logical design performed earlier. The external tables must overlap on the files transferred to HDFS in the previous step. Note, that a header in each *.tbl file must be removed before creating the external tables.
Include into solution4.hql script SELECT statements that any 5 rows from each one of the external tables implemented in the previous step and the total number of rows included in each table.
When ready, use a command line interface beeline to process a script solution4.hql and to save a report from processing in a file solution4.rpt.
Deliverables
A file solution4.rpt with a report from the processing of HQL script solution4.hql.
Submission of Assignment 2
Note, that you have only one submission. So, make absolutely sure that you submit the correct files with the correct contents. Please submit an Academic Consideration in SOLS if an extension (1 week maximally) is required.
Please combine all files into a single zipped file (A2-solutions.zip). Please submit the zipped file through Moodle in the following way:
- Access Moodle at http://moodle.uowplatform.edu.au/
- To login use a Login link located in the right upper corner the Web page or in the middle of the bottom of the Web page
- When logged select a site ISIT312 (SP225) Big Data Management
- Scroll down to a section SUBMISSIONS
- Click at Assignment 2 link.
- Click at a button Add Submission
- Move the zipped file A2-solutions.zip into an area You can drag and drop files here to add them. You can also use a link Add…
- Click at a button Save changes
- Click at a button Submit assignment
- Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work, … in order to confirm authorship of your submission.
- Click at a button Continue
Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers
Struggling with your Assignment 2: CSCI312 Data Warehouse Design & Hive? Well! Stop worrying now. You are at the right place. Our platform provides AI-free SG assignment help. We have experienced writers who provide high-quality, no-plagiarism assignments with 100% original content, and here are free sample assignments that you can look over to see the quality of our work. Contact us right now and get high grades in your academics.
Looking for Plagiarism free Answers for your college/ university Assignments.
- HRM331: Talent Management – Strategic Shift from the War for Talent to the Wealth of Talent
- Marginalised Populations – The Structural and Cultural Exclusion of People Experiencing Homelessness in Singapore
- CVEN3501 Assignment 2: Groundwater Modelling of Drawdown from a Pumping Bore
- CH2123 Assignnment : Fugacity, VLE Modeling & Henry’s Law Applications
- BAFI1045 Assignment -Constructing and Evaluating Passive and Active Portfolios Based on the Straits Times Index (STI)
- PSB501EN Assignment 1: Engineering Systems Integration: A Multi-Technique Approach to Mechanical Analysis
- FIN2210E/FIN2212E Group Assignment: Financial Risk Management Analysis of Bursa Malaysia Companies
- FLM101 Assignment: A Cinematic Dissection: Stylistic Elements and Their Thematic Significance
- Assignment: Transforming Talent in the AI Era: From War to Wealth through Ecosystem Innovation
- COMP 1105 Assignment: Health-Focused E-Commerce Website: A Web Technologies Project Using HTML5, CSS, and JavaScript