Slack Data

This document describes how to import your Slack Discovery data into Worklytics, presumes Slack Discovery API is enabled for your organization and you are backing up the data regularly.

The data will be exported, pseudonymized by Worklytics' Proxy and then imported into Worklytics for analysis. If you do not have Slack Discovery data available, you may consider the Slack Discovery API connector instead.

Common considerations

  • Slack Discovery is enabled for your enterprise and backed up in your premises.

  • Data MUST be in the format described below, or it will be discarded.

  • All files should be in NDJSON format, GZIP compressed. (one message/conversation/user per row)

  • All files should be in UTF-8 encoding.

  • Examples here are shown in clear, this is the input expected to be uploaded to the input bucket, later will be pseudonymized appropriately.

  • To avoid extra processing cost, we recommend discarding some fields that are not useful for workplace analytics purposes, such as the actual message content or messages sent by bots.

Folder Structure

The import job expects a folder per date range exported, with the following files in it:

slack-data-YYYYMMDD-YYYYMMDD
↳ channels.ndjson.gz
↳ dms.ndjson.gz
↳ groups.ndjson.gz
↳ mpims.ndjson.gz
↳ users.ndjson.gz
↳ messages-00000.ndjson.gz
↳ messages-00001.ndjson.gz
  ...
↳ messages-000NN.ndjson.gz

An example of the directory could be “slack-data-20210101-20210401”, directory name is parsed to extract the intended period to be imported, in this case, from January 1st, 2021 to April 1st, 2021. This is needed to perform some optimizations in the import.

Intervals can be as short as days, but we recommend exporting data in larger intervals to reduce the number of files to be imported. One week is a good balance between granularity and number of files, and for messages, one per day. So an example week export would be:

slack-data-20210101-20210107
↳ channels.ndjson.gz
↳ dms.ndjson.gz
↳ groups.ndjson.gz
↳ mpims.ndjson.gz
↳ users.ndjson.gz
↳ messages-20210101.ndjson.gz
↳ messages-20210102.ndjson.gz
    ...
↳ messages-20210107.ndjson.gz

Users file (users.ndjson.gz)

Contains all the users up to the date of the export. The most important thing is the linking between the Slack user id and the email, also if it is a bot or not. Example of minimum fields needed, not in NDJSON for readability:

{
  "id": "U999999999",
  "team_id": "D8989898",
  "profile": {
    "email": "jose@worklytics.co"
  },
  "is_bot": false
}

NDJSON equivalent:

{"id":"U999999999","team_id":"D8989898","profile":{"email":"jose@worklytics.co"}, "is_bot":false}

Conversation files by type. ([channels|dms|groups|mpims].ndjson.gz)

Each file contains all the conversations by type up to the date of the export.

TypeFilenameDescription

MPIMS

mpims.ndjson.gz

Multi person Instant messages (1-n conversations)

DMS

dms.ndjson.gz

Direct messages (1-1 conversations)

Channels

channels.ndjson.gz

public channels

Groups

groups.ndjson.gz

private channels not mpims

Example of minimum fields needed, not in NDJSON format for readability:

{
  "id": "D06TX2RP0",
  "created": 1435454909,
  "members": [
    "U06FG9AKF",
    "U06TX27FW"
  ]
}

NDJSON equivalent:

{"id": "D06TX2RP0", "created": 1435454909, "members": ["U06FG9AKF", "U06TX27FW"]}

Messages file (messages-XXXXX.json.gz)

Contains all the messages up to the date of the export. For convenience, it can be split in multiple files as long as all are under the same folder and follow the naming convention.

Example of minimum fields needed, not in NDJSON for readability:

{
  "client_msg_id": "70299f8c-38ab-4911-835e-f2fe92fea6b6",
  "type": "message",
  "user": "U06FG9AKF",
  "ts": "1620665890.036300",
  "team": "T06FG94SV",
  "reactions": [
    {
      "name": "100",
      "users": [
        "USER_ID"
      ],
      "count": 1
    }
  ],
  "channel_id": "D01VATQPMUG",
  "channel_name": "general"
}

Note there is no field "text". It could be present and the proxy will redact it, but removing it upfront will make the processing faster.

NDJSON equivalent:

{"client_msg_id":"70299f8c-38ab-4911-835e-f2fe92fea6b6","type":"message","text":"","user":"U06FG9AKF","ts":"1620665890.036300","team":"T06FG94SV","reactions":[{"name":"100","users":["USER_ID"],"count":1}],"channel_id":"D01VATQPMUG","channel_name":"general"}

Configuring Connection in Worklytics

  1. Log in to Worklytics (your user needs the Data Connection Admin role).

  2. Select Bulk Data Import via Psoxy

  3. Follow the steps and fill all the fields to connect to your sanitized bucket.

    • Select Data Source Processing: slack-discovery

  4. Connect

Last updated