If you use Psoxy to send pseudonymized data to Worklytics and later wish to re-identify the data that you export from Worklytics to your premises, you'll need a lookup table in your data warehouse to JOIN with that data.
Our aws-host
Terraform module, as used in our Psoxy AWS Example, provides a variable lookup_table_builders
to control generation of these lookup tables.
Populating this variable will generate another version of your HRIS data (aside from the one exposed to Worklytics) which you can then import back to your data warehouse.
To enable it, add the following to your terraform.tfvars
file:
In sanitized_accessor_role_names
, add the name of whatever AWS role that the principal running ingestion of your lookup table from S3 to your data warehouse will assume. You can add additional role names as needed. Alternatively, you can use an IAM policy created outside of our Terraform module to grant access to the lookup table CSVs within the S3 bucket.
After you apply this configuration, the lookup table will be generated in an S3 bucket. The S3 bucket will be shown in the Terraform output:
Use the bucket name shown in your output to build import pipeline to your data warehouse.
If your input file follows the standard HRIS schema for Worklytics, it will have SNAPSHOT,EMPLOYEE_ID,EMPLOYEE_EMAIL,JOIN_DATE,LEAVE_DATE,MANAGER_ID
columns, at minimum.
Every time a new hris snapshot is uploaded to the hris -input
bucket, TWO copies of it will be created: a sanitized copy in the bucket accessible Worklytics, and the lookup variant in the lookup bucket referenced above (not accessible to Worklytics).
The lookup table CSV file will have the following columns: EMPLOYEE_ID,EMPLOYEE_ID_ORIG
If you load this into your Data Warehouse, you can JOIN it with the data you export from Worklytics.
Eg, assuming you've exported the Worklytics Weekly aggregates data set to your data warehouse, load the files from S3 bucket above into a table named lookup_hris
.
Then the following query will give re-identified aggregate data:
The employeeId
column in the result set will be the original employee ID from your HRIS system.
If your HRIS employee ID column is considered PII, then the lookup table and any re-identified data exports you use it to produce should be handled as Personal data, according to your policies, as these now reference readily identifiable Natural Persons.
If you wish limit re-identification to a subset of your data, you can use additional columns present in your HRIS csv to do so, for example:
Within the lookup_table_builders
map, you can specify the following fields:
input_connector_id
- usually hris
; this corresponds the whatever bulk connector you want to build the lookup table for.
rules
- this follows the rules structure for the bulk connector case. The example above is suited for HRIS data following the schema expected by Worklytics. If you modify this, be sure to review our documentation or contact support to ensure you don't break your lookup table.
By default, Psoxy uses AWS Systems Manager Parameter Store to store secrets; this simplifies configuration and minimizes costs. However, you may want to use AWS Secrets Manager to store secrets due to organization policy.
In such a case, you can add the following to your terraform.tfvars
file:
This will alter the behavior of the Terraform modules to store everything considered a secret to be stored/loaded from AWS Secrets Manager instead of AWS Systems Manager Parameter Store. Note that Parameter Store is still used for non-secret configuration information, such as proxy rules, etc.
Changes will also be made to AWS IAM Policies, to allow lambda function execution roles to access Secrets Manager as needed.
If any secrets are managed outside of Terraform (such as API keys for certain connectors), you will need to grant access to relevant secrets in Secrets Manager to the principals that will manage these.
beta - This is now available for customer-use, but may still change in backwards incompatible ways.
Our aws-host
module provides a vpc_config
variable to specify the VPC configuration for the lambdas that our Terraform modules will create, analogous to the vpc_config
block supported by the AWS lambda terraform resource.
Some caveats:
API connectors on a VPC must be exposed via API Gateway rather than Function URLs (our Terraform modules will make this change for you).
VPC must be configured such that your lambda has connectivity to AWS services including S3, SSM, and CloudWatch Logs; this is typically done by adding a VPC Endpoint for each service.
VPC must allow any API connector to connect to data source APIs via HTTPS (eg 443); usually these APIs are on the public internet, so this means egress to public internet.
VPC must allow your API gateway to connect to your lambdas.
The requirements above MAY require you to modify your VPC configuration, and/or the security groups to support proxy deployment. The example we provide in our vpc.tf
should fulfill this if you adapt it; or you can use it as a reference to adapt you existing VPC.
To put the lambdas created by our terraform example under a VPC, please follow one of the approaches documented in the next sections.
If you have an existing VPC, you can use it with the vpc_config
variable by hard coding the ids of the pre-existing resources (provisioned outside the scope of your proxy's terraform configuration).
vpc.tf
If you don't have a pre-existing VPC, you wish to use, our aws example repo includes vpc.tf
file at the top-level. This file has a bunch of commented-out terraform resource blocks that can serve as examples for creating the minimal VPC + associated infra. Review and uncomment to meet your use-case.
Prerequisites:
the AWS principal (user or role) you're using to run Terraform must have permissions to manage VPCs, subnets, and security groups. The AWS managed policy AmazonVPCFullAccess
provides this.
all pre-requisites for the api-gateways (see api-gateway.md)
NOTE: if you provide vpc_config
, the value you pass for use_api_gateway_v2
will be ignored; using a VPC requires API Gateway v2, so will override value of this flag to true
.
Add the following to "psoxy" module in your main.tf
(or uncomment if already present):
Uncomment the relevant lines in vpc.tf
in the same directory, and modify as you wish. This file pulls the default VPC/subnet/security group for your AWS account under terraform.
Alternatively, you modify vpc.tf
to use a provision non-default VPC/subnet/security group, and reference those from your main.tf
- subject to the caveats above.
See the following terraform resources that you'll likely need:
Check your Cloud Watch logs for the lambda. Proxy lambda will time out in INIT phase if SSM Parameter Store or your secret store implementation (AWS Secrets Manager, Vault) is not reachable.
Some potential causes of this:
DNS failure - it's going to look up the SSM service by domain; if the DNS zone for the SSM endpoint you've provisioned is not published on the VPC, this will fail; similarly, if the endpoint wasn't configured on a subnet - then it won't have an IP to be resolved.
if the IP is resolved, you should see failure to connect to it in the logs (timeouts); check that your security groups for lambda/subnet/endpoint allow bidirectional traffic necessary for your lambda to retrieve data from SSM via the REST API.
Terraform with aws provider doesn't seem to play nice with lambdas/subnets; the subnet can't be destroyed w/o destroying the lambda, but terraform seems unaware of this and will just wait forever.
So:
destroy all your lambdas (terraform state list | grep aws_lambda_function
; then terraform destroy --target=
for each, remember '' as needed)
destroy the subnet terraform destroy --target=aws_subnet.main
https://docs.aws.amazon.com/lambda/latest/dg/foundation-networking.html
https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html
Some organizations require use of API Gateway. This is not the default approach for Psoxy since AWS added support for Lambda Function URLs (March 2022), which are a simpler and more direct way to expose lambdas via HTTPS.
Nonetheless, should you wish to use API Gateway we provide beta support for this. It is needed if you wish to put your Lambda functions on a VPC (See lambdas-on-vpc.md
).
In particular:
IAM policy that allows api gateway methods to be invoked by the proxy caller role is defined once, using wildcards, and exposes GET/HEAD/POST methods for all resources. While methods are further constrained by routes and the proxy rules themselves, this could be another enforcement point at the infrastructure level - at expense of N policies + attachments in your terraform plan instead of 1.
proxy instances exposed as lambda function urls have 55s timeout, but API Gateway seems to support 30s as max - so this may cause timeouts in certain APIs
Prerequisites:
the AWS principal (user or role) to provision API gateways. The AWS managed policy AmazonAPIGatewayAdministrator
provides this.
Add the following to your terraform.tfvars
file:
Then terraform apply
should create of API gateway-related resources, including policies/etc; and destroy lambda function urls (if you've previously applied with use_api_gateway=false
, which is the default).
If you wish to use API Gateway V1, you will not be able to use the flag above. Instead, you'll have to do something like the following:
Additionally, you'll need to set a different handler class to be invoked instead of the default (co.workltyics.psoxy.Handler
, should be co.worklytics.psoxy.APIGatewayV1Handler
). This can be done in Terraform or by modifying configuration via AWS Console.
beta - we're not committed that maintaining this under versioning policy; minor proxy iterations may require changes to privileges required in the least-privileged role.
This is a guide about how to create a role for provisioning psoxy infrastructure in AWS, following the principle of least-privilege at permission-level, rather than policy-level.
Eg, as of v0.4.55 of the proxy, our docs provide guidance on using an AWS role to provision your psoxy infrastructure using the least-privileged set of AWS managed policies possible. A stronger standard would be to use a custom IAM policy rather than AWS managed policy, with the least-privileged set of permissions required.
Additionally, you can specify resource constraints to improve security within a shared AWS account. (However, we do not recommend or officially support deployment into a shared AWS account. We recommend deploying your proxy instances in isolated AWS account to provide an implicit security boundary by default, as an additional layer of protection beyond those provided by our proxy modules)
We provide an example IAM policy document in our psoxy-constants
module that you can use to create a IAM policy in AWS. You can do this outside terraform, finding the JSON from that policy OR via terraform as follows: