[Python] Apply SHA-3 to MySQL Dataset with pysha3

SHA-3, a subset of the cryptographic primitive family Keccak is a cryptographic hash function, designed to be very efficient in hardware but is relatively slow in software. SHA-3 takes about double the time compared to SHA-2 to run in software and about a quarter of the time to run in hardware.

Although many of you might still be discovering the newest NIST adoption, SHA-3, some companies are already trying to implement this algorithm to securely encrypt their data.

In the latest project I was involved we have a very particular scenario in which we decided to use SHA-3:

  • We were interested in comparing the (#percentage of common customers) among several companies of the same Group

For example:

    • C1
    • C2
    • C1
    • C3

In this example we can see that customer C1 exists in both companies and C2 and C3 exist only in one company, this way we can say that: “33% of our customers are shared between COMPANY A and COMPANY B”.

Because we were not interested in comparing the customer real data like Name or Address and there were some legal constraints related to sharing customer data between companies, we decided to hash in SHA-3, the customers name and phone number, which were defined as being unique by the group and share the results as CSV.

To orchestrate this we decided to develop a Python script with the following structure:

  1. Query MySQL and retrieve customer data
  2. Hash the customer Name and Phone column with SHA-3 with the library Pysha3
  3. Export data set to CSV

In the following code block you will find the entire script, that I hope can be usefull for you followed by a simple explanation on how to use it.