Deployment Migration

Overview

This document describes how to migrate your deployment from one cloud provider to another, or one project/account to another. It does not cover migrating between proxy versions.

Use cases:

  • move from a dev account to a prod account (Account / Project Migration)

  • move from a "shared" account to a "dedicated" account (Account / Project Migration)

  • move from AWS --> GCP, and vice versa (Provider Migration)

Preparation

Preserving Existing Infrastructure

Some data/infrastructure MUST, or at least SHOULD be preserved during your migration. Below is an enumeration of both cases.

Data, such as configuration values, can generally be copied; you just need to make a new copy of it in the new environment managed by the new Terraform configuration.

Some infrastructure, such as API Clients, will be moved; eg, the same underlying resource will continue to exist, it will just be managed by the new Terraform configuration instead of the old one. This is the more tedious case, as you must both import this infrastructure to your new configuration and then rm (remove) it from your old configuration, rather than having it be destroyed when you teardown the old configuration. You should carefully review every terraform apply, including terraform destroy commands, to ensure that infrastructure you intend to move is not destroyed, or replaced (eg, terraform sees it as tainted, and does a destroy + create within a single apply operation).

What you MUST copy:

  • SALT value. This is a secret used to generate the pseudonyms. If this is lost/destroyed, you will be unable to link any data pseudonymized with the original salt to data you process in the future.

    • NOTE: the underlying resource to preserve is actually a random_password resource, not an SSM parameter / GCP Secret - because those simply are being filled from the terraform random_password resource; if you import parameter/secret, but not the random_password, Terraform will generate a new value and overwrite the parameter/secret.

    • as of v0.4.35 examples, the terraform resource ID for this value is expected to be module.psoxy.module.psoxy.random_password.pseudonym_salt; if not, you can search for it with terraform state list | grep random_password

  • value for PSEUDONYMIZE_APP_IDS. This value, if set to true will have the proxy use a rule set that pseudonymizes identifiers issued by source applications themselves in some cases where these identifiers aren't inherently PII - but the association could be considered discoverable.

  • value for EMAIL_CANONICALIZATION. prior to v0.4.52, this default was in effect STRICT; so if your original deployment was built on a version prior to this, you should explicitly set this value to STRICT in your new configuration (likely email_canonicalization variable in terraform modules)

  • any custom sanitization rules that you've set, either in your Terraform configuration or directly as the value of a RULES environment variable, SSM Parameter, or GCP Secret.

  • historical sanitized files for any bulk connectors, if you wish to continue to have this data analyzed by Worklytics. (eg, everything from all your -sanitized buckets)

NOTE: you do NOT need to copy the ENCRYPTION_KEY value; rotation of this value should be expected by clients.

What you SHOULD move:

  • API Clients. Whether generated by Terraform or not, the "API Client" for a data source must typically be authorized by a data source administrator to grant it access to the data source. As such, if you destroy the client, or lose its id, you'll need to coordinate with the administrator again to recreate it / obtain the configuration information.

    • as of v0.4.35, Google Workspace and Microsoft 365 API clients are managed directly by Terraform, so these are important to preserve.

What you SHOULD copy:

  • API Client Secrets, if generated outside of Terraform. If you destroy/lose these values, you'll need to contact the data source administrator to obtain new versions.

Prior to beginning your migration, you should make a list of what existing infrastructure and/or configuration values you intend to move/copy.

Migration Plan

The following is a rough guide on the steps you need to take to migrate your deployment.

Phase 1 : Gather information from Existing Environment

  1. Salt value. If using an example forked from our template repos at v0.4.35 or later, you can find the output block in your main.tf for pseudonym_salt, uncomment it, run terraform apply. You'll then be able to obtain the value with: terraform output --raw pseudonym_salt On macOS, you can copy the value to your clipboard with: terraform output --raw pseudonym_salt | pbcopy

  2. Microsoft 365 API client, if any:

    • Find the resource ids: terraform state list | grep "\.azuread_application\."

    • For each, obtain it's objectId: terraform state show 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector'

    • Prepare import command for each client for your new configuration, eg: terraform import 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector' '<objectId>'

  3. Google Workspace API clients, if any:

    • Find the resource ids: tf state list | grep 'google_service_account\.connector-sa'

    • For each, obtain its unique_id: terraform state show 'module.worklytics_connectors_google_workspace.module.google_workspace_connection["gdirectory"].google_service_account.connector-sa'

    • Prepare import command for each client for your new configuration, eg: terraform import 'module.worklytics_connectors_google_workspace.module.google_workspace_connection["gdirectory"].google_service_account.connector-sa' '<unique_id>'

Phase 2 : Create New Environment

  1. Create a new Terraform configuration from scratch; run terraform init there (if you begin with one of our examples, our init script does this). Use the terraform.tfvars of your existing configuration as a guide for what variables to set, copying over any needed values.

  2. Run a provisional terraform plan and review.

  3. Run the imports you prepared in Phase 1, if all appear OK, run another terraform plan and review (comparing to the old one).

  4. Optionally, run terraform plan -out=plan.out to create a plan file; if you send this, along with all the *.tf/*.tfvars files to Worklytics, we can review it and confirm that it is correct.

  5. Run terraform apply to create the new infrastructure; re-confirm that the plan is not re-creating any API clients/etc that you intended to preserve

  6. Via AWS / GCP console, or CLIs, move the values of any secrets/parameters that you intend to by directly reading the values from your old account/project, and copying them into the new account/project

Phase 3: Migrate

  1. Look at the TODO 3 files/output variables for all your connectors. Make a mapping between the old values and the new values. Send this to Worklytics. It should include for each the proxy URLs, AWS Role to use, and any other values that are changing.

  2. Wait for confirmation that Worklytics has migrated all your connections to the new values. This may take 1-2 days.

Phase 4: Destroy Old Environment

  1. Remove references to any API Clients you migrated in Phase 1:

    • eg, terraform state rm 'module.psoxy.module.msft-connection["azure-ad"].azuread_application.connector'

  2. run terraform destroy in the old configuration. Carefully review the plan before confirming.

    • if you're using Google Workspace sources, you may see destruction of google_project_service resources; if you allow these to be destroyed, these APIS will be disabled; if you are using the same GCP project in your other configuration, you should run terraform apply there again to re-enable them.

  3. You may also destroy any API clients/etc that are managed outside of Terraform and which you did not migrate to the new environment.

  4. You may clean up any configuration values, such as SSM Parameters / GCP Secrets to customize the proxy rules sets, that you may have created in your old host environment.

Last updated