Autor: rpmachado

My name is Rui Machado, i am 26 years old, informatic, specialized in Information Systems, more particularly in Business Intelligence and Database Development. I also have a special taste for Software Engineering. I have a degree in Information Systems a post Graduation in Business Intelligence and now I am finishing my Masters Degree in Informatics. Although I live in Vila Nova de Gaia, Porto, Portugal my imagination lives all over the world and I can't avoid my will to change and learn new things. For me there is no single life you can live as many as you want, you just need the courage to materialize your dreams and bury the past so deep that when someone finds it, you will be buried deeper. Along my life I have been a sports fan, playing football, roller hockey, karate, handball and table tennis. I am now focused in practicing surf and skateboarding :) I have already won several programming contests, have been a speaker at various seminars in the area of innovation and personal growth and i have already been awarded with four merit distinctions between my University and the Portuguese government. I am also a writer for syncfusion, having already written three eBooks: SQL Server Integration Services 2012, PowerShell and BizTalk Server.

[PowerShell] Auto-Refresh Excel PowerPivot Data

When working with Excel a common reality is connecting to your databases in order to create for example a data abstraction layer for you analysis which can be accomplished with PowerPivot.

Although Excel comes with a lot of wizards to connect to databases and even to refresh your data, when you start developing multiple reports and the number of files you need to refresh for example every day increase, this might become a headache and not very easy to manage.

In order to ease your life, the following PowerShell script accepts as a parameter a “folder path” where you can put all your excel files and then refreshes all power pivot models in it.



[System.Threading.Thread]::CurrentThread.CurrentCulture = [System.Globalization.CultureInfo]::GetCultureInfo("en-US")
$interopAssembly = [Reflection.Assembly]::LoadWithPartialname("Microsoft.Office.Interop.Excel") 

$allWbks=@((Get-ChildItem -Path $input_folder_name -Filter "*.xls*"))
$allWbks | %{

    $file = $_.FullName

    "File: " + $file
    #kill all instances of MSExcel

    Get-Process | where{$_.ProcessName -like "*EXCEL*"} | kill

    #Create new Excel app
    $excel = new-object Microsoft.Office.Interop.Excel.ApplicationClass
    $book = $excel.Workbooks.Open($file)
    $book.Connections | %{

        #If PowerPivot Model (Type=7)
        #if($_.Type -eq "7")
            "Refreshing PowerPivot Model: " + $_.Name

            if($_ -ne $null){




"Finished Processing"

Thank you




[Python] Apply SHA-3 to MySQL Dataset with pysha3

SHA-3, a subset of the cryptographic primitive family Keccak is a cryptographic hash function, designed to be very efficient in hardware but is relatively slow in software. SHA-3 takes about double the time compared to SHA-2 to run in software and about a quarter of the time to run in hardware.

Although many of you might still be discovering the newest NIST adoption, SHA-3, some companies are already trying to implement this algorithm to securely encrypt their data.

In the latest project I was involved we have a very particular scenario in which we decided to use SHA-3:

  • We were interested in comparing the (#percentage of common customers) among several companies of the same Group

For example:

    • C1
    • C2
    • C1
    • C3

In this example we can see that customer C1 exists in both companies and C2 and C3 exist only in one company, this way we can say that: “33% of our customers are shared between COMPANY A and COMPANY B”.

Because we were not interested in comparing the customer real data like Name or Address and there were some legal constraints related to sharing customer data between companies, we decided to hash in SHA-3, the customers name and phone number, which were defined as being unique by the group and share the results as CSV.

To orchestrate this we decided to develop a Python script with the following structure:

  1. Query MySQL and retrieve customer data
  2. Hash the customer Name and Phone column with SHA-3 with the library Pysha3
  3. Export data set to CSV

In the following code block you will find the entire script, that I hope can be usefull for you followed by a simple explanation on how to use it.


Star Schema 101

For all of you joining the world of analytic’s, I would like to share a small tutorial on star schema that you can use to optimize your analytic queries and data storage in databases. Although its one of the most used dimensional modelling techniques I would like to reenforce that this data organization architecture is not just for data warehousing as you can use it as a reference model to create for example powerpivot models based in excel files, that will be this way faster and more optimized.

For this post I have decided to follow a problem/solution approach.


BI Concepts and Topics to Explore

“The beginning of wisdom is a definition of terms” 

There are days in which I decide to surf the web on a quest for new knowledge, typically I start by “googling” for some topic and let my will guide me trough the articles/books I find interesting. Of course this can be useful sometimes but others I end up depressed with the amount of new topics, concepts and architectures I find in the Business intelligence field.

This way I would like to share with you guys some of my latest finds and invite to research more on these as they can most probably affect the way we see and build a Data warehouse:


Download My SSIS eBook for Free [US-EN]

I have been the author of this book which was written for Syncfusion, to increase their offer on the Succinctly series.


SSIS Succinctly

SQL Server Integration Services is part of Microsoft’s business intelligence suite and an ETL (extract, transform, and load) tool. It does more than just move data between databases. It can be used to clean and transform data so that it can be used by data warehouses or even OLAP-based systems. With SSIS Succinctly by Rui Machado, you will learn how to build and deploy your own ETL solution in a drag-and-drop development environment by using SSIS packages, control flows, data flows, tasks, and more.


Download My PowerShell eBook for Free [US-EN]

I have been the author of this book which was written for Syncfusion, to increase their offer on the Succinctly series. Two more books are on stack to be released so wait for more news in the following weeks.


PowerShell Succinctly highlights some of the PowerShell programming model’s many benefits, specifically for .NET developers and system administrators. Author Rui Machado guides readers through time-saving methods that simplify code testing by eliminating the need to create a new application in Visual Studio. Also included are tips for using additional services, such as PowerGui, WMI, and SQL Server, to get the most out of PowerShell. Even if you don’t already use scripting languages to manage your machines, PowerShell Succinctly will show you just how easy it is to automate activities, work with databases, and interact with a variety of file types with this useful model.


DW TIP: Get next or previous value with SQL Analytic Function [EN-US; PLSQL]

While querying our databases we might face a typical problema which is getting the next or the previous value of an atribute according to some rule applyed to a certain dataset. This happens more if you deal with datawarehouses and need to retrieve this kind of analytical information. The typical solutions involve several “group by” and sub queries to achieve the same result. This way SQL has a powerful feature  which are the Analytical functions.

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.


Social Media Marketing KPI’s

As you know companies use Key Performance Indicators (KPIs) to monitor its performance according to the objectives they define. If the value expected for a period is lower that the current one your KPI for that value is going well, otherwise you should take actions to correct whatever is going wrong.

Although commonly used in Business Intelligence, these same KPIs can be used to analyze the performance of you company in many different contexts. Lets take a look for instance ate the Social Media Marketing or as I have been hearing more often Marketing Intelligence. In this context you want to analyze how is your company performing in social media channels and compare your results with the ones of your competitors. To do so I have been investigating how can you gather data regarding your competitors and how you can analyze it.


Why do managers need data analysis?

Nowadays I often see managers talking about Business Intelligence, Data mining and big data like the best thing in the world, and when they open their laptops, they still manage companies in their excel sheets and refer to big data as an excel file with 10MB and Business Intelligence as being creating charts with lots of colors. When I look at them and hear their strategies for information management in their companies I imagine a blind person trying to solve the rubiks cube. 


Enable Change Data Capture (SQL Server 2008+)

Along with Slow Changing Dimensions, the Change Data Capture is one of the most important mechanisms to handle changes. While Slow Changing Dimensions define how you want to treat changes in you Dimensions attributes, Change Data Capture allows you to define how you want to handle the extraction of changed records from the source systems.

As you can imagine, the extraction of data from operational sources is very important in Extract, Transform and Load operations using Integration Services or any other provider and this is because you do not want to process all the data again when you run your packages, you just want to process the records that had suffered changes (inserted, Updated and Deleted). If you schedule the execution of you ETL process every day at 2 in the morning for the processing of a four year operational system database, you don´t want to process the data from those four years, you just want to process the records that have changed since the last night, this way your ETL process will be faster and much more efficient.