getting-started.md
Tips and tricks for using GCP as to host the proxy.
Some orgs have policies that block authentication of the GCloud CLI client, requiring you to contact your IT team and have it added to an approved list. Apart from that, there are several possibilities:
use the GCP Cloud Shell (via GCP web console). gcloud
is pre-installed and pre-authorized as your Google user in the Cloud Shell.
use a VM in GCP Compute Engine, with the VM running as a sufficiently privileged service account. In such a scenario, gcloud
will be pre-authenticated by GCP on the VM as that service account.
create credentials within the project itself:
enable IAM API and Cloud Resource Manager API within the project
create OAuth credentials for a 'desktop application' within the target GCP project
download the client-secrets.json
file to your environment
run gcloud auth application-default login --client-id-file=/path/to/client-secrets.json
Terraform relies on GCP's REST APIs for its operations. If these APIs are disabled either the target project OR the project in which the identity (service account, OAuth client) under which you're running terraform resides, you may get an error.
The solution is to enable APIs via the Cloud Console, specifically:
IAM API
Cloud Resource Manager API
If some resources seem to not be properly provisioned, try terraform taint
or terraform state rm
, to force re-creation. Use terrafrom state list | grep
to search for specific resource ids.
If you receive an error such as:
This may be due to an Organization Policy that restricts the domains that can be used in IAM policies. See https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains
You may need define an exception for the GCP project in which you're deploying the proxy, or add the domain of your Worklytics Tenant SA to the list of allowed domains.
This page provides an overview of how psoxy authenticates and confirms authorization of clients (Worklytics tenants) to access data for GCP-hosted deployments.
As Worklytics tenants run inside GCP, they are implicitly authenticated by GCP. No secrets or keys need be exchanged between your Worklytics tenant and your Psoxy instance. GCP can verify the identity of requests from Worklytics to your instance, just as it does between any process and resource within GCP.
Invocations of your proxy instances are authorized by the IAM policies you define in GCP. For API connectors, you grant the Cloud Function Invoker role to your Worklytics tenant's GCP service account on the Cloud Function for your instance.
For the bulk data case, you grant the Storage Object Viewer role to your Worklytics tenant's GCP service account on the sanitized output bucket for your connector.
You can obtain the identity of your Worklytics tenant's GCP service account from the Worklytics portal.
clone the repo (or a private-fork of it)
if using Microsoft 365 sources, install and authenticate Azure CLI
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
if deploying AWS infra, install and authenticate AWS CLI
You should now be ready for the general instructions in the README.md.
You'll provision infrastructure that ultimately looks as follows:
This includes:
Cloud Functions
Service Accounts
Secret Manager Secrets, to hold pseudonymization salt, encryption keys, and data source API keys
Cloud Storage Buckets (GCS), if using psoxy to sanitize bulk file data, such as CSVs
NOTE: if you're connecting to Google Workspace as a data source, you'll also need to provision Service Account Keys and activate Google Workspace APIs.
a Google Project
we recommend a dedicated GCP project for your deployment, to provide an implicit security boundary around your infrastructure as well as simplify monitoring/cleanup
a GCP (Google) user or Service Account with permissions to provision Service Accounts, Secrets, Storage Buckets, Cloud Functions, and enable APIs within that project. eg:
Cloud Functions Admin - proxy instances are deployed as GCP cloud functions
Cloud Storage Admin - processing of bulk data (such as HRIS exports) uses GCS buckets
IAM Role Admin - create custom roles for the proxy, to follow principle of least privilege
Secret Manager Admin - your API keys and pseudonymization salt is stored in Secret Manager
Service Account Admin - admin Service Accounts that personify Cloud Functions or are used as Google Workspace API connections
Service Usage Admin - you will need to enable various GCP APIs
the following APIs enabled in the project: (via GCP Console)
IAM Service Account Credentials API (iamcredentials.googleapis.com
) - generally needed to support authenticating Terraform. May not be needed if you're running terraform
within a GCP environment.
Service Usage API (serviceusage.googleapis.com
)
additional APIs enabled in the project: (using the Service Usage API
above, our Terraform will attempt to enable these, but as there is sometimes a few minutes delay in activation and in some cases they are required to read your existing infra prior to apply, you may experience errors. To pre-empt those, we suggest ensuring the following are enabled:
Compute Engine API (compute.googleapis.com
)
Cloud Build API (cloudbuild.googleapis.com
)
Cloud Functions API (cloudfunctions.googleapis.com
)
Cloud Resource Manager API (cloudresourcemanager.googleapis.com
)
IAM API (iam.googleapis.com
)
Secret Manager API (secretmanager.googleapis.com
)
Storage API (storage-api.googleapis.com
)
You'll also need a secure backend location for your Terraform state (such as a GCS or S3 bucket). It need not be in the same host platform/project/account to which you are deploying the proxy, as long as the Google/AWS user you are authenticated as when running Terraform has permissions to access it.
Some options:
GCS : https://developer.hashicorp.com/terraform/language/settings/backends/gcs
S3 : https://developer.hashicorp.com/terraform/language/settings/backends/s3
Alternatively, you may use a local file system, but this is not recommended for production use - as your Terraform state may contain secrets such as API keys, depending on the sources you connect.
See: https://developer.hashicorp.com/terraform/language/settings/backends/local
For some help in bootstraping a GCP environment, see also: infra/modules/gcp-bootstrap/README.md
The module psoxy-constants is a dependency-free module that provides lists of GCP roles, etc needed for bootstraping a GCP project in which your proxy instances will reside.
The https://github.com/Worklytics/psoxy-example-gcp repo provides an example configuration for hosting proxy instances in GCP. You use that template, following it's Usage
docs to get started.
the 'Service Account' approach described in the prerequisites is preferable to giving a Google user account IAM roles to administer your infrastructure directly. You can pass this Service Account's email address to Terraform by setting the gcp_terraform_sa_account_email
. Your machine/environments CLI must be authenticated as GCP entity which can impersonate this Service Account, and likely create tokens as it (Service Account Token Creator
role).
using a dedicated GCP project is superior to using a shared project, as it provides an implicit security boundary around your infrastructure as well as simplifying monitoring/cleanup. The IAM roles specified in the prerequisites must be granted at the project level, so any non-Proxy infrastructure within the GCP project that hosts your proxy instances will be accessible to the user / service account who's managing the proxy infrastructure.
With those, you can can run locally via IntelliJ, using run configs (located in .idea/runConfigurations
):
package install core
builds the core JAR, on which implementations depend
gcp - run gmail
builds and runs a local instance for GMail
Or from command line:
By default, that serves the function from http://localhost:8080.
1.) run terraform init
and terraform apply
from infra/dev-personal
to provision environment
2.) run locally via IntelliJ run config
3.) execute the following to verify your proxy is working OK
Health check (verifies that your client can reach and invoke the proxy at all; and that is has sensible config)
Using a message id you grab from that:
1.) deploy to GCP using Terraform (see infra/
). Follow steps in any TODO files it generates.
2.) Set your env vars: (these should be in a TODO file generated by terraform in prev step
3.) grant yourself access (probably not needed if you have primitive role in project, like Owner or Editor)
alternatively, you can add Terraform resource for this to your Terraform config, and apply it again:
Either way, if this function is for prod use, please remove these grants after you're finished testing.
4.) invocation examples
The apply (java, maven, etc).