![]() To do this, you normalize route_dim and week_dim. Remember your running database from last chapter?Īfter learning about the snowflake schema, you convert the current star schema into a snowflake schema. week_id WHERE month = 'July ' and year = '2019 ' Running from star to snowflake Get all the week_id's that are from July, 2019 INNER JOIN week_dim ON week_dim. ![]() SELECT - Get the total duration of all runs SUM(duration_mins) Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?Ī fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively.Ī fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively.Ī fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.Ĭreate a dimension table called route that will hold the route information.Ĭreate a dimension table called week that will hold the week information You gather this data and put it into one table called Runs with the following schema:Īfter learning about dimensional modeling, you decide to restructure the schema for the database. You also record the route and the distances of your runs. You're most concerned with tracking how long you are running each week. It's only natural that you begin collecting data on your weekly running routine. To create accessible and isolated data repositories for other analysts To store customer data that needs to be updated regularly To store real-time social media posts that may be used for future analysis To train a machine learning model with a 150 GB of raw image data. When should you choose a data warehouse over a data lake? Resulting dataframe is written into AWS Redshift Warehouse Recommend a storage solution Python script drops null rows and clean data into pre determined columns Place the steps in the most appropriate order.ĮCommerce API outputs real time data of transactions In the ETL flow you design, different steps will take place.You reason that an ELT approach is unnecessary because there is relatively little data (< 50 GB). You decide to upgrade their system to a data warehouse after hearing that different departments would like to run their own business analytics. Their system is quite outdated because their only data repository is a traditional database to record transactions. You have been hired to manage data at a small online clothing store. OLTP because this table's structure appears to require frequent updates. OLAP because this table focuses on pothole requests only.OLTP because this table's structure appears to require frequent updates.OLAP because each record has a unique service request number.OLTP because this table could not be used for any analysis.What data processing approach is this larger repository most likely using? It contains pothole reports made by Chicago residents from the past week.Įxplore the dataset. In this exercise, Potholes has been loaded as an example of a table in this repository. Chicago maintains a data repository of all these services organized by type of requests. 311 service requests are non-urgent community requests, ranging from graffiti removal to street light outages. The city of Chicago receives many 311 service requests throughout the day. Most likely to have data from past hour.Data is inserted and updated more often.Help businesses with decision making and problem solving.Categorize the cards into the approach that they describe best.In this exercise, you are given a list of cards describing a specific approach which you will categorize between OLAP and OLTP. You should now be familiar with the differences between OLTP and OLAP.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |