BI Concepts and Topics to Explore

“The beginning of wisdom is a definition of terms” 

There are days in which I decide to surf the web on a quest for new knowledge, typically I start by “googling” for some topic and let my will guide me trough the articles/books I find interesting. Of course this can be useful sometimes but others I end up depressed with the amount of new topics, concepts and architectures I find in the Business intelligence field.

This way I would like to share with you guys some of my latest finds and invite to research more on these as they can most probably affect the way we see and build a Data warehouse:

Big Data

Definition: It’s the process of mining information, constantly changing its shape, speed and volume, with the goal of finding useful insights for decision support. A big data class problem is any business problem that’s so large that it can’t be easily managed using a single processor:

  • Big data problems force you to move away from a single-processor environment toward the more complex world of distributed computing.

Big data analytic’s is often associated with cloud computing because of the resources that are available for both storage, processing and querying of large data sets, plus in the cloud all technologies required for big data analysis are hosted.

Use Case: Big data might allow a company to collect billions of real-time data points on its products, resources, or customers – and then repackage that data instantaneously to optimize customer experience or resource utilization.

Data Lake

Definition: It’s a repository for large amount of data both structured and unstructured. Can be both a staging area to feed the data warehouse or to develop real-time analytic’s. Accepts input from several sources and can preserve both the original data fidelity and transformed data. Data models learn from data and evolve during time rather than defined up front.

Use Case: Data lakes can help resolve the nagging problem of accessibility and data integration. Using big data infrastructures, enterprises are starting to pull together increasing data volumes for analytics or simply to store for undetermined future use.

Sharding

Definition: A type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.

Use Case: If you have an extreme volume of structured data, you could separate the data onto multiple relational database management system servers. You could then query across all the systems at once

Data Vault

Definition: A database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems.

  • The data vault consists of three core components, the Hub, Link and Satellite.
  • Hubs contain a list of unique business keys with low propensity to change. Hubs also contain a surrogate key for each Hub item and metadata describing the origin of the business key.
    • The descriptive attributes for the information on the Hub (such as the description for the key, possibly in multiple languages) are stored in structures called Satellite
  • The Satellite contains the descriptive information (context) for a business key.
  • A Link represents a natural business relationships between business keys and is established the first time this new unique association is presented to the EDW

 

Use Cases: The data vault principles are specifically well suited for a Enterprise Data Warehousing program and – when applied consistently – can provide the organization with some very compelling benefits. These include auditability, agility, adaptability, alignment with the business, and support for operational data warehousing initiatives.

Thanks,

Rui Machado

Leave a comment