Lookup Tables
Last updated
Last updated
If you use Psoxy to send pseudonymized data to Worklytics and later wish to re-identify the data that you export from Worklytics to your premises, you'll need a lookup table in your data warehouse to JOIN with that data.
Our aws-host
Terraform module, as used in our Psoxy AWS Example, provides a variable lookup_table_builders
to control generation of these lookup tables.
Populating this variable will generate another version of your HRIS data (aside from the one exposed to Worklytics) which you can then import back to your data warehouse.
To enable it, add the following to your terraform.tfvars
file:
In sanitized_accessor_role_names
, add the name of whatever AWS role that the principal running ingestion of your lookup table from S3 to your data warehouse will assume. You can add additional role names as needed. Alternatively, you can use an IAM policy created outside of our Terraform module to grant access to the lookup table CSVs within the S3 bucket.
After you apply this configuration, the lookup table will be generated in an S3 bucket. The S3 bucket will be shown in the Terraform output:
Use the bucket name shown in your output to build import pipeline to your data warehouse.
If your input file follows the standard HRIS schema for Worklytics, it will have SNAPSHOT,EMPLOYEE_ID,EMPLOYEE_EMAIL,JOIN_DATE,LEAVE_DATE,MANAGER_ID
columns, at minimum.
Every time a new hris snapshot is uploaded to the hris -input
bucket, TWO copies of it will be created: a sanitized copy in the bucket accessible Worklytics, and the lookup variant in the lookup bucket referenced above (not accessible to Worklytics).
The lookup table CSV file will have the following columns: EMPLOYEE_ID,EMPLOYEE_ID_ORIG
If you load this into your Data Warehouse, you can JOIN it with the data you export from Worklytics.
Eg, assuming you've exported the Worklytics Weekly aggregates data set to your data warehouse, load the files from S3 bucket above into a table named lookup_hris
.
Then the following query will give re-identified aggregate data:
The employeeId
column in the result set will be the original employee ID from your HRIS system.
If your HRIS employee ID column is considered PII, then the lookup table and any re-identified data exports you use it to produce should be handled as Personal data, according to your policies, as these now reference readily identifiable Natural Persons.
If you wish limit re-identification to a subset of your data, you can use additional columns present in your HRIS csv to do so, for example:
Within the lookup_table_builders
map, you can specify the following fields:
input_connector_id
- usually hris
; this corresponds the whatever bulk connector you want to build the lookup table for.
rules
- this follows the rules structure for the bulk connector case. The example above is suited for HRIS data following the schema expected by Worklytics. If you modify this, be sure to review our documentation or contact support to ensure you don't break your lookup table.