Getting Started
Last updated
Last updated
You'll provision the following to host Psoxy in AWS:
S3 buckets, if using the 'bulk' mode to sanitize file data (such as CSVs); see S3 docs
Cognito Pools and Identities, if connecting to Microsoft 365 data sources
The diagram below provides an architecture overview of the 'API' and 'Bulk' mode use-cases.
An AWS Account in which to deploy Psoxy We strongly recommend you provision one specifically for use to host Psoxy, as this will create an implicit security boundary, reduce possible conflicts with other infra configured in the account, and simplify eventual cleanup.
You will need the numeric AWS Account ID for this account, which you can find in the AWS Console.
If your AWS organization enforces Service Control Policies, ensure that these are allow the AWS components required by Psoxy or exempt the AWS Account in which you will deploy Psoxy from these policies.
If your organization uses any sort of security control enforcement mechanism, you may have disable/provide exceptions to those controls for you initial deployment. Then generally those controls can be implemented later by extending our examples. Our protips page provides some guidance on how to extend the base examples to meet more extreme requirements.
A sufficiently privileged AWS Role You must have a IAM Role within the AWS account with sufficient privileges to (AWS managed policy examples linked):
create IAM roles + policies (eg IAMFullAccess)
create and update Systems Manager Parameters (eg, AmazonSSMFullAccess )
create and manage Lambdas (eg AWSLambda_FullAccess )
create and manage S3 buckets (eg AmazonS3FullAccess )
create Cloud Watch Log groups (eg CloudWatchFullAccess)
(Yes, the use of AWS Managed Policies results in a role with many privileges; that's why we recommend you use a dedicated AWS account to host proxy which is NOT shared with any other use case)
You will need the ARN of this role.
NOTE: if you're connecting to Microsoft 365 (Azure AD) data sources, you'll also need permissions to create AWS Cognito Identity Pools and add Identities to them, such as arn:aws:iam::aws:policy/AmazonCognitoPowerUser. Some AWS Organizations have Service Control Policies in place that deny this by default, even if you have an IAM role that allows it at an account level.
NOTE: using AWS API Gateway, VPC, or Secrets Manager (not used by default in our examples) will require additional permissions beyond the above.
See: protips.md for guide to create a least-privileged iam policy for provisioning.
An authenticated AWS CLI in your provisioning environment. Your environment (eg, shell/etc from which you'll run terraform commands) must be authenticated as an identity that can assume that role. (see next section for tips on options for various environments you can use)
Eg, if your Role is arn:aws:iam::123456789012:role/PsoxyProvisioningRole
, the following should work:
To provision AWS infra, you'll need the aws-cli
installed and authenticated on the environment where you'll run terraform
.
Here are a few options:
Generate an AWS Access Key for your AWS User.
Run aws configure
in a terminal on the machine you plan to use, and configure it with the key you generated in step one.
NOTE: this could even be a GCP Cloud Shell, which may simplify auth if your wish to connect your Psoxy instance to Google Workspace as a data source.
If your organization prefers NOT to authorize the AWS CLI on individual laptops and/or outside AWS, provisioning Psoxy's required infra from an EC2 instance may be an option.
provision an EC2 instance (or request that your IT/dev ops team provision one for you). We recommend a micro instance with an 8GB disk, running ubuntu
(not Amazon Linux; if you choose that or something else, you may need to adapt these instructions). Be sure to create a PEM key to access it via SSH (unless your AWS Organization/account provides some other ssh solution).
associate the Role above with your instance (see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html)
Whichever environment you choose, follow general prereq installation.
You'll also need a backend location for your Terraform state (such as an S3 bucket). It can be in any AWS account, as long as the AWS role that you'll use to run Terraform has read/write access to it.
See https://developer.hashicorp.com/terraform/language/settings/backends/s3.
Alternatively, you may use a local file system, but this is not recommended for production use - as your Terraform state may contain secrets such as API keys, depending on the sources you connect.
See https://developer.hashicorp.com/terraform/language/settings/backends/local.
The module psoxy-constants is a dependency-free module that provides lists of AWS managed policies, etc needed for bootstraping a AWS account in which your proxy instances will reside.
Once you've fulfilled the prereqs, including having your terraform deployment environment, backend, and AWS account prepared, we suggest you use our AWS example template repo:
Follow the 'Usage' instructions there to continue.