Total Pageviews

May 20, 2015

5/20/2015 10:57:00 PM

Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model.
To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:
Dimension: A category of information. For example, the time dimension.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.
Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.
Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema.
Whether one uses a star or a snowflake largely depends on personal preference and business needs. Personally, I am partial to snowflakes, when there is a business case to analyze the information at that particular level.


A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include:
  • Includes the important entities and the relationships among them.
  • No attribute is specified.
  • No primary key is specified.
The figure below is an example of a conceptual data model.

Conceptual Data Model
Conceptual Data Model
From the figure above, we can see that the only information shown via the conceptual data model is the entities that describe the data and the relationships between those entities. No other information is shown through the conceptual data model.

  logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:
  • Includes all entities and relationships among them.
  • All attributes for each entity are specified.
  • The primary key for each entity is specified.
  • Foreign keys (keys identifying the relationship between different entities) are specified.
  • Normalization occurs at this level.
The steps for designing the logical data model are as follows:
  1. Specify primary keys for all entities.
  2. Find the relationships between different entities.
  3. Find all attributes for each entity.
  4. Resolve many-to-many relationships.
  5. Normalization.
The figure below is an example of a logical data model.


Logical Data Model
Comparing the logical data model shown above with the conceptual data model diagram, we see the main differences between the two:
  • In a logical data model, primary keys are present, whereas in a conceptual data model, no primary key is present.
  • In a logical data model, all attributes are specified within an entity. No attributes are specified in a conceptual data model.
  • Relationships between entities are specified using primary keys and foreign keys in a logical data model. In a conceptual data model, the relationships are simply stated, not specified, so we simply know that two entities are related, but we do not specify what attributes are used for this relationship.
  • Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include:
  • Specification all tables and columns.
  • Foreign keys are used to identify relationships between tables.
  • Denormalization may occur based on user requirements.
  • Physical considerations may cause the physical data model to be quite different from the logical data model.
  • Physical data model will be different for different RDBMS. For example, data type for a column may be different between MySQL and SQL Server.
  • The steps for physical data model design are as follows:
  • Convert entities into tables.
  • Convert relationships into foreign keys.
  • Convert attributes into columns.
  • Modify the physical data model based on physical constraints / requirements.
  • The figure below is an example of a physical data model.


Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include:
Specification all tables and columns.
Foreign keys are used to identify relationships between tables.
Denormalization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different from the logical data model.
Physical data model will be different for different RDBMS. For example, data type for a column may be different between MySQL and SQL Server.
The steps for physical data model design are as follows:
Convert entities into tables.
Convert relationships into foreign keys.
Convert attributes into columns.
Modify the physical data model based on physical constraints / requirements.
The figure below is an example of a physical data model.

Physical Data Model
Comparing the logical data model shown above with the logical data model diagram, we see the main differences between the two:
Entity names are now table names.
Attributes are now column names.
Data type for each column is specified. Data types can be different depending on the actual database being used

Junk Dimension
In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep those information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues.
Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.
Let's look at an example. Assuming that we have the following fact table:
Fact Table Before Junk Dimension
In this example, the last 3 fields are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:
Fact Table With Junk Dimension
Note that now the number of dimensions in the fact table went from 7 to 5.
The content of the junk dimension table would look like the following:
Junk Dimension Example
In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.
By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.
What Is OLAP
OLAP stands for On-Line Analytical Processing. The first attempt to provide a definition to OLAP was by Dr. Codd, who proposed 12 rules for OLAP. Later, it was discovered that this particular white paper was sponsored by one of the OLAP tool vendors, thus causing it to lose objectivity. The OLAP Report has proposed the FASMI test, Fast Analysis of Shared Multidimensional Information. For a more detailed description of both Dr. Codd's rules and the FASMI test, please visit The OLAP Report.
For people on the business side, the key feature out of the above list is "Multidimensional." In other words, the ability to analyze metrics in different dimensions such as time, geography, gender, product, etc. For example, sales for the company is up. What region is most responsible for this increase? Which store in this region is most responsible for the increase? What particular product category or categories contributed the most to the increase? Answering these types of questions in order means that you are performing an OLAP analysis.


 
Related Posts Plugin for WordPress, Blogger...