This guide provides a roadmap of a typical implementation with Worklytics-provided support.
30-60 min video call to get overview of process, responsibilities
Attendees:
Product Stakeholder(s)
Data Source Administrator(s), if identified
IT Admin(s), if identified
Agenda:
determine data sources, and who can authorize access to each
determine host platform (GCP or AWS)
identify who has the permissions to manage infra, will be able to run Terraform, and how they'll run it (where, authenticated how)
scope desired data interval, approximate headcount, etc.
identify any potential integration issues or infrastructure constraints
1-2 hr video call, to walk-through customization and initial terraform runs via screenshare
Attendees:
IT Admin(s) who will be running Terraform
Worklytics technical contact
Prior to this call, please follow the initial steps in the Getting Started
section for your host platform and ensure you have all Prereqs
Goals:
get example customized and a terraform plan working.
run terraform apply
. Obtain the TODO 1
files you can send to your data source administrators to complete, as needed.
Tips:
Works best if we screenshare
can be completed without call; but Worklytics can assist if desired
follow TODO 2
files / use test *.sh
shell scripts produced by terraform apply
validate that authentication/authorization is correct for all connections, and that you're satisfied with proxy behavior
Further guidance on proxy testing:
https://docs.worklytics.co/psoxy/guides/testing
can be completed without call; but Worklytics can assist if desired
Authorize Worklytics to invoke API connectors and access sanitized bulk data:
obtain service account ID of your tenant from Worklytics (via Worklytics web portal)
configure it in your terraform.tfvars
file (details below)
run terraform apply
again to update IAM policy to reflect the change
For AWS-hosted case, add the numeric ID of your Worklytics tenant to the list caller_gcp_service_account_ids
:
eg
For GCP-hosted case, add the email address of your Worklytics tenant to the list worklytics_sa_emails
:
eg
can be completed without call; but Worklytics can assist if desired
follow TODO 3
files (or terraform output values) generated by the terraform apply
command
if you do not have access to Worklytics, or you do, but do not have Data Connection Admin
role, send these files to the appropriate person
If you're using Terraform Cloud or Enterprise, here are a few things to keep in mind.
NOTE: this is tested only for gcp; for aws YMMV, and in particular we expect Microsoft 365 sources will not work properly, given how those are authenticated.
Prereqs:
git/java/maven, as described here https://github.com/Worklytics/psoxy#required-software-and-permissions
for testing, you'll need the CLI of your host environment (eg, AWS CLI, GCloud CLI, Azure CLI) as well as npm/NodeJS installed on your local machine
After authenticating your terraform CLI to Terraform Cloud/enterprise, you'll need to:
Create a Project in Terraform Cloud; and a workspace within the project.
Clone one of our example repos and run the ./init
script to initialize your terraform.tfvars
for Terraform Cloud. This will also put a bunch of useful tooling on your machine.
3. Commit the bundle that was output by the ./init
script to your repo:
Change the terraform backend main.tf
to point to your Terraform Cloud rather than be local
remove backend
block from main.tf
add a cloud
block within the terraform
block in main.tf
(obtain content from your Terraform Cloud)
run terraform init
to migrate the initial "local" state to the remote state in Terraform Cloud
You'll have to authenticate your Terraform Cloud with Google / AWS / Azure, depending on the cloud you're deploying to / data sources you're using.
If you're using Terraform Cloud or Enterprise, our convention of writing "TODOs" to the local file system might not work for you.
To address this, we've updated most of our examples to also output todo values as Terraform outputs, todos_1
, todos_2
, etc.
To get them nicely on your local machine, something like the following:
get an API token from your Terraform Cloud or Enterprise instance (eg, https://developer.hashicorp.com/terraform/cloud-docs/users-teams-organizations/api-tokens).
set it as an env variable, as well as the host:
run a curl command using those values to get each todos:
If you have terraform
CLI auth'd against your Terraform Cloud or Enterprise instance, then you might be able to avoid the curl-hackery above, and instead use the following:
(This approach should also work with Terraform CLI running with backend
, rather than cloud
)
As Terraform Cloud runs remotely, the test tool we provide for testing your deployment will not be available by default on your local machine. You can install it locally and adapt the suggestions from the todos_2
output variable of your terraform run to test your deployment from your local machine or another environment. See testing.md for details.
If you have run our init
script locally (as suggested in 'Getting Started') then the test tool should have been installed (likely at .terraform/modules/psoxy/tools/
). You will need to update everything in todos_2.md
to point to this path for those test commands to work.
If you need to directly install/re-install it, something like the following should work:
By default, the Terraform examples provided by Worklytics install a NodeJS-based tool for testing your proxy deployments.
Full documentation of the test tool is available here. And the code is located in the tools
directory of the Psoxy repository.
Wherever you run this test tool from, your AWS or GCloud CLI must be authenticated as an entity with permissions to invoke the Lambda functions / Cloud functions that you deployed for Psoxy.
If you're testing the bulk cases, the entity must be able to read/write to the cloud storage buckets created for each of those bulk examples.
If you're running the Terraform examples in a different location from where you wish to run tests, then you can install the tool alone:
Clone the Psoxy repo to your local machine:
From within that clone, install the test tool:
Get specific test commands for your deployment
If you set the todos_as_outputs
variable to true
, your Terraform apply run should contain todo2
output variable with testing instructions.
If you set todos_as_local_files
variable to true
, your Terraform apply run should contain local files named TODO 2 ...
with testing instructions.
In both cases, you will need to replace the test tool path included there with the path to your installation.
Example commands of the primary testing tool: "Psoxy Test Calls"
If you used and approach other than Terraform, or did not directly use our Terraform examples, you may not have the testing examples or the test tool installed on your machine.
In such a case, you can install the test tool manually by following steps 1+2 above, and then can review the documentation on how to use it from your machine.
Node.js testing tool for Worklytics Psoxy.
We provide a collection of Node.js scripts to help you test your Worklytics Psoxy deploy. The requirements to be able to run the scripts are Node.js (version >=16) and npm (version >=8). First of all, install the npm dependencies: npm i
.
The primary tool is a command line interface (CLI) script that allows you to execute "Psoxy Test Calls" to your Worklytics Psoxy instance. Check all the available options by running node cli-call.js -h
(*).
We also provide a script to test "Psoxy bulk instances": they consist of an input bucket, an output one, and the Psoxy instance itself. The script allows you to upload a comma-separated values file (CSV) to the input bucket, it will check that the Psoxy has processed the file and have written it to the output bucket removing all Personal Identifiable Information (PII) from the file (as per Psoxy rules). Check available options by running node cli-file-upload.js -h
(*).
A third script lets you check your Psoxy instance logs: node cli-logs.js -h
(*).
(*) Options may vary depending on whether you've deployed the Worklytics Psoxy to Amazon Web Services (AWS) or Google Cloud Platform (GCP).
Assuming that you've successfully deployed the Psoxy to AWS, and you've configured Google Calendar as data source, let's see an example:
The -r
option is mandatory for AWS deploys, and identifies the Amazon Resource Name (ARN) of the "role" that will be assumed (*) to be able to execute the call. The -u
option is the URL you want to test. In this case, the URL's path matches a Google Calendar API endpoint (access the primary calendar of the currently logged-in user). The -i
option identifies the user "to impersonate"; this option is only relevant for Google Workspace data sources.
Another example for Zoom:
As you can see, the differences are:
As this is not a Google Workspace data source, you don't need the -i
option.
The URL's path matches a Zoom API endpoint in this case
(*) Requests to AWS API need to be signed, so you must ensure that the machine running these scripts have the appropriate AWS credentials for the role you've selected.
For GCP, every call needs an "identity token" (-t, --token
option in the examples below) for the account that has access to the Cloud Platform (*). If you omit the token, the script will try to get it automatically, so you must authorize gcloud first.
Google Calendar example:
Zoom example:
Outlook Calendar example (token option omitted):
(*) You can obtain it by running gcloud auth print-identity-token
(using Google Cloud SDK)
Use the --health-check
option to check if your deploy is correctly configured:
Example response for Zoom:
The -d, --data-source
option of our CLI script allows you to test all the endpoints for a given data source (available data sources are listed in the script's help: -h
option). The only difference with the previous examples is that the -u, --url
option has to be the URL of the deploy without the corresponding API path of the data source:
Notice how the URL changes, and any other option the Psoxy may need doesn't.
Assuming that you've successfully deployed the Psoxy to AWS, you can inspect the logs by running the following command:
Use the following command to review the runtime logs of your Psoxy deploy to GCP:
The <projectId>
option is the Google Cloud project identifier that hosts your Psoxy deploy, and the <functionName>
option is the identifier of the Cloud Function that represents the Psoxy instance itself.
Assuming that you've successfully deployed the Psoxy "bulk instance" to AWS, you need to provide the script with a CSV example file containing some PII records, the name of the input bucket and the output one (these are expected to be S3 buckets in the same AWS region). The script also needs the AWS region (default is us-east-1
), and the ARN of the role that will be assumed to perform the upload and download operations.
Example:
Use the following command to test a Psoxy "bulk" instance deployed to GCP:
In this case, -i
and -o
options represent Google Cloud Storage buckets.
The testing script will rename the files you upload by appending a timestamp value as suffix: my-test-file.csv
will appear as my-test-file-{timestamp}.csv
in both the input and output buckets. This is done to avoid conflicts with files that may already exist in the buckets.
By default, the sanitized file will be deleted from the output bucket after the comparison test (original file vs. sanitized one). Run node cli-file-upload.js -h
to see all the available options (keep sanitized file in the output bucket, save it to disk, etc).
This document describes how to migrate your deployment from one cloud provider to another, or one project/account to another. It does not cover migrating between proxy versions.
Use cases:
move from a dev
account to a prod
account (Account / Project Migration)
move from a "shared" account to a "dedicated" account (Account / Project Migration)
move from AWS --> GCP, and vice versa (Provider Migration)
Some data/infrastructure MUST, or at least SHOULD be preserved during your migration. Below is an enumeration of both cases.
Data, such as configuration values, can generally be copied; you just need to make a new copy of it in the new environment managed by the new Terraform configuration.
Some infrastructure, such as API Clients, will be moved; eg, the same underlying resource will continue to exist, it will just be managed by the new Terraform configuration instead of the old one. This is the more tedious case, as you must both import
this infrastructure to your new configuration and then rm
(remove) it from your old configuration, rather than having it be destroy
ed when you teardown the old configuration. You should carefully review every terraform apply
, including terraform destroy
commands, to ensure that infrastructure you intend to move is not destroyed, or replaced (eg, terraform sees it as tainted, and does a destroy
+ create
within a single apply
operation).
What you MUST copy:
SALT
value. This is a secret used to generate the pseudonyms. If this is lost/destroyed, you will be unable to link any data pseudonymized with the original salt to data you process in the future.
NOTE: the underlying resource to preserve is actually a random_password
resource, not an SSM parameter / GCP Secret - because those simply are being filled from the terraform random_password
resource; if you import parameter/secret, but not the random_password
, Terraform will generate a new value and overwrite the parameter/secret.
as of v0.4.35 examples, the terraform resource ID for this value is expected to be module.psoxy.module.psoxy.random_password.pseudonym_salt
; if not, you can search for it with terraform state list | grep random_password
value for PSEUDONYMIZE_APP_IDS
. This value, if set to true
will have the proxy use a rule set that pseudonymizes identifiers issued by source applications themselves in some cases where these identifiers aren't inherently PII - but the association could be considered discoverable.
value for EMAIL_CANONICALIZATION
. prior to v0.4.52, this default was in effect STRICT
; so if your original deployment was built on a version prior to this, you should explicitly set this value to STRICT
in your new configuration (likely email_canonicalization
variable in terraform modules)
any custom sanitization rules that you've set, either in your Terraform configuration or directly as the value of a RULES
environment variable, SSM Parameter, or GCP Secret.
historical sanitized files for any bulk connectors, if you wish to continue to have this data analyzed by Worklytics. (eg, everything from all your -sanitized
buckets)
NOTE: you do NOT need to copy the ENCRYPTION_KEY
value; rotation of this value should be expected by clients.
What you SHOULD move:
API Clients. Whether generated by Terraform or not, the "API Client" for a data source must typically be authorized by a data source administrator to grant it access to the data source. As such, if you destroy the client, or lose its id, you'll need to coordinate with the administrator again to recreate it / obtain the configuration information.
as of v0.4.35
, Google Workspace and Microsoft 365 API clients are managed directly by Terraform, so these are important to preserve.
What you SHOULD copy:
API Client Secrets, if generated outside of Terraform. If you destroy/lose these values, you'll need to contact the data source administrator to obtain new versions.
Prior to beginning your migration, you should make a list of what existing infrastructure and/or configuration values you intend to move/copy.
The following is a rough guide on the steps you need to take to migrate your deployment.
Salt value. If using an example forked from our template repos at v0.4.35
or later, you can find the output
block in your main.tf
for pseudonym_salt
, uncomment it, run terraform apply
. You'll then be able to obtain the value with: terraform output --raw pseudonym_salt
On macOS, you can copy the value to your clipboard with: terraform output --raw pseudonym_salt | pbcopy
Microsoft 365 API client, if any:
Find the resource ids: terraform state list | grep "\.azuread_application\."
For each, obtain it's objectId
: terraform state show 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector'
Prepare import command for each client for your new configuration, eg: terraform import 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector' '<objectId>'
Google Workspace API clients, if any:
Find the resource ids: tf state list | grep 'google_service_account\.connector-sa'
For each, obtain its unique_id
: terraform state show 'module.worklytics_connectors_google_workspace.module.google_workspace_connection["gdirectory"].google_service_account.connector-sa'
Prepare import command for each client for your new configuration, eg: terraform import 'module.worklytics_connectors_google_workspace.module.google_workspace_connection["gdirectory"].google_service_account.connector-sa' '<unique_id>'
Create a new Terraform configuration from scratch; run terraform init
there (if you begin with one of our examples, our init
script does this). Use the terraform.tfvars
of your existing configuration as a guide for what variables to set, copying over any needed values.
Run a provisional terraform plan
and review.
Run the imports you prepared in Phase 1, if all appear OK, run another terraform plan
and review (comparing to the old one).
Optionally, run terraform plan -out=plan.out
to create a plan file; if you send this, along with all the *.tf
/*.tfvars
files to Worklytics, we can review it and confirm that it is correct.
Run terraform apply
to create the new infrastructure; re-confirm that the plan is not re-creating any API clients/etc that you intended to preserve
Via AWS / GCP console, or CLIs, move the values of any secrets/parameters that you intend to by directly reading the values from your old account/project, and copying them into the new account/project
Look at the TODO 3
files/output variables for all your connectors. Make a mapping between the old values and the new values. Send this to Worklytics. It should include for each the proxy URLs, AWS Role to use, and any other values that are changing.
Wait for confirmation that Worklytics has migrated all your connections to the new values. This may take 1-2 days.
Remove references to any API Clients you migrated in Phase 1:
eg, terraform state rm 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector'
run terraform destroy
in the old configuration. Carefully review the plan before confirming.
if you're using Google Workspace sources, you may see destruction of google_project_service
resources; if you allow these to be destroyed, these APIS will be disabled; if you are using the same GCP project in your other configuration, you should run terraform apply
there again to re-enable them.
You may also destroy any API clients/etc that are managed outside of Terraform and which you did not migrate to the new environment.
You may clean up any configuration values, such as SSM Parameters / GCP Secrets to customize the proxy rules sets, that you may have created in your old host environment.
There are two approaches to upgrade you Proxy to a newer version.
In both cases, you should carefully review your next terraform plan
or terraform apply
for changes to ensure you understand what will be created, modified, or destroyed by the upgrade.
If you have doubts, review CHANGELOG.md
for highlights of significant changes in each version; and detailed release notes for each release:
https://github.com/Worklytics/psoxy/releases
upgrade-terraform-modules
ScriptIf you originally used one of our example repos (psoxy-example-aws or psoxy-example-gcp, etc), starting from version v0.4.30
, you can use the following command leveraging a script creating when you initialized the example:
This will update all the versions references throughout your example, and offer you a command to revert if you later wish to do so.
Open each .tf
file in the root of your configuration. Find all module references ending in a version number, and update them to the new version.
Eg, look for something like the following:
update the v0.4.37
to v0.4.46
:
Then run terraform init
after saving the file to download the new version of each module(s).
Done with your Psoxy deployment?
Terraform makes it easy to clean up when you're through with Psoxy, of you wish to rebuild everything from scratch.
First, a few caveats:
this will NOT undo any changes outside of Terraform, even those we instructed you to perform via TODO -
files that Terraform may have generated.
be careful with anything you created outside of Terraform and later imported into Terraform, such as GCP project / AWS account themselves. If you DON'T want to destroy these, do terraform state rm <resource>
(analogue of the import) for each.
Do the following to destroy your Psoxy infra:
open you main.tf
of your terraform confriguation; remove ALL blocks that aren't terraform
, or provider
. You'll be left with ~30 lines that looks like the following.
NOTE: do not edit your terraform.tfvars
file or remove any references to your AWS / Azure / GCP accounts; Terraform needs be authenticated and know where to delete stuff from!
run terraform apply
. It'll prompt you with a plan that says "0 to create, 0 to modify" and then some huge number of things to destroy. Type 'yes' to apply it.
That's it. It should remove all the Terraform infra you created.
if you want to rebuild from scratch, revert your changes to main.tf
(git checkout main.tf
) and then terraform apply
again.