This document describes how to import your Slack Discovery data into Worklytics, presumes Slack Discovery API is enabled for your organization and you are backing up the data regularly.
The data will be exported, pseudonymized by Worklytics' Proxy and then imported into Worklytics for analysis. If you do not have Slack Discovery data available, you may consider the Slack Discovery API connector instead.
Slack Discovery is enabled for your enterprise and backed up in your premises.
Data MUST be in the format described below, or it will be discarded.
All files should be in NDJSON format, GZIP compressed. (one message/conversation/user per row)
All files should be in UTF-8 encoding.
Examples here are shown in clear, this is the input expected to be uploaded to the input bucket, later will be pseudonymized appropriately.
To avoid extra processing cost, we recommend discarding some fields that are not useful for workplace analytics purposes, such as the actual message content or messages sent by bots.
The import job expects a folder per date range exported, with the following files in it:
An example of the directory could be “slack-data-20210101-20210401”, directory name is parsed to extract the intended period to be imported, in this case, from January 1st, 2021 to April 1st, 2021. This is needed to perform some optimizations in the import.
Intervals can be as short as days, but we recommend exporting data in larger intervals to reduce the number of files to be imported. One week is a good balance between granularity and number of files, and for messages, one per day. So an example week export would be:
Contains all the users up to the date of the export. The most important thing is the linking between the Slack user id and the email, also if it is a bot or not. Example of minimum fields needed, not in NDJSON for readability:
NDJSON equivalent:
{"id":"U999999999","team_id":"D8989898","profile":{"email":"jose@worklytics.co"}, "is_bot":false}
Each file contains all the conversations by type up to the date of the export.
Example of minimum fields needed, not in NDJSON format for readability:
NDJSON equivalent:
{"id": "D06TX2RP0", "created": 1435454909, "members": ["U06FG9AKF", "U06TX27FW"]}
Contains all the messages up to the date of the export. For convenience, it can be split in multiple files as long as all are under the same folder and follow the naming convention.
Example of minimum fields needed, not in NDJSON for readability:
Note there is no field "text". It could be present and the proxy will redact it, but removing it upfront will make the processing faster.
NDJSON equivalent:
{"client_msg_id":"70299f8c-38ab-4911-835e-f2fe92fea6b6","type":"message","text":"","user":"U06FG9AKF","ts":"1620665890.036300","team":"T06FG94SV","reactions":[{"name":"100","users":["USER_ID"],"count":1}],"channel_id":"D01VATQPMUG","channel_name":"general"}
Log in to Worklytics (your user needs the Data Connection Admin role).
Select Bulk Data Import via Psoxy
Follow the steps and fill all the fields to connect to your sanitized bucket.
Select Data Source Processing: slack-discovery
Connect
For certain data types, Worklytics supports import of data from bulk files, either uploaded directly to Worklytics via our web portal or from a cloud storage location (such as Google Cloud Storage, AWS S3, etc). This document provides an overview of that process and the supported file formats for each data type.
The data flow is as follows:
Export your data to a file (.csv
)
Upload the flat file directly to Worklytics (via our Web App) or to a cloud storage location that you've connected to Worklytics (an AWS S3 bucket, a Google Cloud Storage bucket, etc.)
Worklytics parses the file and loads your data into its system (eg, updates our representation of your HRIS data accordingly, appends the survey results, etc.)
If you're using our sanitization/pseudonymization Psoxy, the second step of the flow will include an intermediary phase where the file is loaded into a storage location within your premises, from where our proxy (running in your premises) will apply pseudonymization and place the sanitized result into a second storage location (also in your premises). You may either download the sanitized file from that location to be uploaded via our Web App, or may connect Worklytics directly to the location of the sanitized file (preferred).
File names should use only [A-Za-z0-9_.-]
characters. Do not use whitespace or special characters, to avoid transfer problems to/from various cloud storage providers (*).
Suffixes of files after last .
are expected to be a file format (eg, .csv
).
Last 8 characters prior to the last .
are expected to be a date in YYYYMMDD
format (eg ISO) if applicable for the file type. This is expected to be the effective date of the file, although the semantics of how this value is interpreted may vary by import type
(*) File naming considerations: Google Cloud Storage and Amazon Simple Storage Service.
For data in Comma-Separated Values format (.csv
or CSV), you must follow the formatting requirements and conventions described in this section.
We to parse CSV as specified in RFC 4180, except as specified here (and in later sections for each particular data type: HRIS, Surveys, etc).
In particular, each file MUST include a header row as the first line of the file, specifying column names (this is NOT required by RFC 4180). This header line is interpreted as follows:
column names are interpreted as case-insensitive (EMPLOYEE_ID
is equivalent to employee_id
)
leading/trailing whitespace is trimmed (ignored) (EMPLOYEE_ID
is equivalent to EMPLOYEE_ID
)
column names MAY be enclosed in double quotes ("EMPLOYEE_ID"
is equivalent to EMPLOYEE_ID
)
column names containing commas MUST be enclosed in double quotes
The ordering of columns within a file does not matter and need not be consistent across files.
The following tables summarizes values.
NOTES:
Any value MAY be enclosed with matching double-quotes ("
). If so, and the value itself contains a double-quote, it MUST be escaped by preceding it with another double-quote. ("aaa","b""bb","ccc"
)
Any value that contains a comma (,
) MUST be enclosed in double-quotes ("
). Eg, a row intended to contain the value Smith, John
in the second column must be formatted as valueA,"Smith, John",valueC
Do NOT mix DATE
formats within a single file type, as potentially ambiguous. Use only one or the other and set the format on the connection. We prefer ISO
, specifically yyyy-MM-dd
(eg, 2022-12-09
) as the most readable and unambiguous.
Any identifier, such as employee identifiers (EMPLOYEE_ID
,MANAGER_ID
, etc) MUST be formatted consistently across ALL data sources and import files for which you intend them to match, except for leading/trailing whitespace. Eg, Identifiers that refer to the same entity (person) MUST be _byte-wise_** equivalent** after trimming any leading/trailing whitespace.
Eg, 0123
will NOT be match to 123
, abc
will NOT match to ABC
, etc.
Email addresses are a special case, where pseudonyms are generated based on canonicalization of domain/mailbox based on typical email address semantics. For example, these will be handled as case-insensitive, .
in mailbox portion ignored, etc. Eg, alicedoe@acme.com
will result in same pseudonym as alice.doe@acme.com
, Alice.Doe@Acme.com
, etc.
For example:
File 1: HRIS Import
File 2: Survey data
As one EMPLOYEE_ID
is E-001
and e1
, Worklytics will NOT match these rows as references to the same individual.
Please verify that all sources you intend to connect to Worklytics provide employee ids in the same format.
Worklytics supports ingesting of files compressed with GZIP. We strongly recommend you utilize this to improve performance and reduce cost.
There are two supported ways to indicated that your file is compressed:
Set the Content-Encoding
metadata on the storage object to gzip
. This is the most standards compliant approach and plays well with native GCS/S3 tooling; so it is our preferred method.
Append the .gz
suffix to the file name (DEPRECATED; may be removed from January 30, 2025)
Prerequisites:
as this data will be linked by EMPLOYEE_ID, you MUST also provide HRIS data to map EMPLOYEE_ID
to individuals by email in your organization
Suggested filename: badge_data_YYYYMMDD.csv
Worklytics can import badge swipe data, to parse swipe records as work events and gain insights into physical office usage. Each row is parsed as a single badge swipe event.
If you have multiple physical buildings (or other space gated by swipe entry), you can provide an identifying string for each swipe event. With each swipe event, you can also provide a string identifying the individual's assigned building. The identifying strings used for BUILDING_ID
and BUILDING_ASSIGNED
must be consistent.
Header Name | Required | Type | Description |
---|---|---|---|
Prerequisities:
if you will link this data EMPLOYEE_ID rather than EMPLOYEE_EMAIL, you MUST also provide HRIS data to link EMPLOYEE_ID
to individuals by email in your organization.
Suggested filename: survey_data_YYYYMMDD.csv
This file provides input on survey data for employees during their lifetime at the company. We only support questions with response types that can be encoded as numerical values.
Field Name | Required | Type | Description |
---|---|---|---|
1 This field name identifies a distinct question in the survey. Eg, SURVEY_QUESTION_1
is expected to refer to the equivalent question in this and any other survey data file. If this question is omitted in future surveys, do not include the field (column) in future files. If questions are added in future surveys, use a unique variation of SURVEY_QUESTION_N
that you have not used before. 2 At least one question should be provided. 3 Encode survey answers numerically. Eg, as integer (1
, or 1.0
, etc; if categorical) or float (0.7
, etc; if continuous). String values that cannot be parsed as a number are discarded.
Worklytics supports importing Human Resource Information System (HRIS) data from CSV files to our platform. This data defines basic information about your personnel: join/leave date, manager, and potentially organizational structure (eg, dept, role, team, etc). We require this data to be provided in columns as specified below.
Each data row in the CSV will be a point-in-time snapshot of the HRIS record for an employee, as the employee appeared in your HRIS on the SNAPSHOT
date. This effective time for the snapshot is provided either as the value of the SNAPSHOT
column (preferred), or as a suffix (YYYYMMDD
) of the filename immediately prior to the file extension (eg, employee_snapshot_YYYYMMDD.csv
).
If a SNAPSHOT
field value is provided, it will take precedence over the value parsed from the filename suffix. You MAY provide snapshots for multiple dates in a single file, but in such case you MUST provide a SNAPSHOT
value for each row.
We recommend snapshots be generated WEEKLY, and that you initially provide one snapshot per week for the period of which you intend to analyze data. Eg, if it's currently June 2024, and you want analysis back to begining of the year, you'd send roughly 24 weeks worth of snapshots. It is simplest to split that into one file per week, even if you're providing the SNAPSHOT
column.
Snapshots should include all active employees and all terminated employees, unless the latter were terminated before the earliest snapshot date you're providing (eg, the start of the period for analysis).
Individual employees can be excluded from processing by providing a WORKLYTICS_SAMPLE
column with a value of false
for employees to be excluded.
Suggested filename: employee_snapshot_YYYYMMDD.csv
Field Name | Required | Type | Description |
---|---|---|---|
(1): If you include MANAGER_EMAIL
and are using the pseudonymization proxy, ensure you modify the rules to pseudonymize it.
(2): Boolean types are case-insensitive, and can be provided as true
, false
, 1
, 0
, yes
, no
, y
, n
, t
, f
. Any other value will be considered false
.
If WORKLYTICS_SAMPLE field is not provided in any file, we assume true
, meaning anyone in the file is included in data processing.
If at least one file contains the WORKLYTICS_SAMPLE field, we assume it is in use and most recent value found for each employee prevails. If no value is found for an employee, we assume false
.
(3): We use IANA time zones. See https://github.com/eggert/tz/blob/master/zone1970.tab
(4): Group fields are used to indicate the employee belonging to a certain group at snapshot date. Custom Groups can represent any kind of grouping in your organization that is meaningful for you for analysis purposes. These groupings are used later to filter and aggregate data in Worklytics. Some examples of groups are: teams, office, business unit, region, division, department, role, etc.
The fields must start with CG_
prefix to distinguish them from others. Once imported, the prefix is discarded and the group name is normalized.
The name of the field is normalized, so two fields like CG_BUSINESS_UNIT
and CG_BUSINESSUNIT
will be considered the same and will error, make sure to use the same name for the same group consistently over time.
By default, only 10 custom group fields are allowed. If you need more, please contact sales@worklytics.co.
Given this employee information
Karen joined the company on 2021-06-15 as Senior Sales Rep. She's based in London, UK.
Jaime joined the company on 2022-10-05 as Junior Sales Rep. He's based in London, UK. Karen is his manager. Role
Alice joined the company on 2022-10-05 as Junior Sales Rep. She's based in Madrid, Spain. Karen is her manager.
Alice left the company on 2022-10-20.
All work in Sales TEAM
Here's an example of weekly snapshots during October, generated on Mondays, using ISO date format.
File: employee_snapshot_20221003.csv
Only Karen was in the company that week
File: employee_snapshot_20221010.csv
Notes: Jaime and Alice are new in the system, included in the files
File: employee_snapshot_20221017.csv
Notes: no changes
File: employee_snapshot_20221024.csv
Notes: Alice quits. The system will automatically bind Alice's data to her work stint.
Log in to Worklytics (your user needs the Data Connection Admin role).
Select the appropriate connector depending on your needs:
HRIS Data Import connector: processes the data input directly
HRIS Data Import via Psoxy connector: performs pseudonymization of the data input before processing it
Whichever connector you've chosen, follow the instructions to complete the connection:
Select the Parser setting: EMPLOYEE_SNAPSHOT
Select the Date format setting: US
or ISO
, matching your HRIS export date format.
Type | Filename | Description |
---|---|---|
TYPE | VALUES ACCEPTED |
---|---|
header in file | Custom Group Name in Worklytics |
---|---|
MPIMS
mpims.ndjson.gz
Multi person Instant messages (1-n conversations)
DMS
dms.ndjson.gz
Direct messages (1-1 conversations)
Channels
channels.ndjson.gz
public channels
Groups
groups.ndjson.gz
private channels not mpims
EMPLOYEE_ID
Yes
STRING
Id of individual in org's HR schema
SWIPE_DATE
Yes
DATETIME
Date and time the employee entered/exited the location, UTC
BUILDING_ID
No
STRING
The building this employee accessed
BUILDING_ASSIGNED
No
STRING
The building where this employee is assigned
SNAPSHOT
Yes
DATE
When the individual provided the survey response
EMPLOYEE_ID
If EMPLOYEE_EMAIL absent
STRING
Survey respondent's id in organization's HRIS data source
EMPLOYEE_EMAIL
If EMPLOYEE_ID absent
STRING
Survey respondent's primary email address
SURVEY_QUESTION_11
Yes2
FLOAT3
The answer to 1st survey question, ideally a 1-5 scale (or 1-10). 1 means worst value
...
SURVEY_QUESTION_N
Yes2
FLOAT
The answer to Nth survey question, ideally a 1-5 scale (or 1-10). 1 means worst value
SNAPSHOT
Unless in filename.
DATE
Snapshot date, the moment in time this row represents
EMPLOYEE_ID
Yes
STRING
Employee id in org's HR schema
EMPLOYEE_EMAIL
Yes
STRING
Employee's main email address
JOIN_DATE
Yes
DATE
Join date
LEAVE_DATE
Yes
DATE
Leave date or empty if current employee
LEAVE_REASON
No
STRING
VOLUNTARY or INVOLUNTARY
MANAGER_ID
Yes
STRING
Employee id of the manager as of the snapshot date
MANAGER_EMAIL
Only if manager not in the file (1)
STRING
Email of the manager as of the snapshot date
WORKLYTICS_SAMPLE
No
BOOLEAN (2)
Whether the individual should be included in data processing.
OFFICE_TZ
No
STRING
Time zone the employee is based (3)
OFFICE_START_HOURS
No
TIME_OF_DAY
When working day starts HH:MM format (24H format)
OFFICE_END_HOURS
No
TIME_OF_DAY
When working day ends HH:MM format (24H format)
CG_***
No
STRING
Groups the employee belongs to at snapshot. Multiple custom group fields can be included within the file. See details (4)
GITHUB_USERNAME
No
STRING
A GitHub username to associated with employee (see connectors/github
GITHUB_USERNAME_ALT
No
STRING
An additional GitHub username to associate with employee (useful if using both GitHub Cloud and on-prem)
cg_location
LOCATION
CG_DIVISION
DIVISION
CG_SQUAD
SQUAD
STRING
Any UTF-8 character.
BOOLEAN
"TRUE", any other value parsed as false
FLOAT
Numerical values
TIME_OF_DAY
HH:MM, 24H format
DATE
ISO
Format: yyyy-MM-dd
, yyyyMMdd
, yyyy/MM/dd
, dd/MM/yyyy
, dd-MM-yyyy
US
Format: MM-dd-yy
, MM/dd/yy
, MM-dd-yyyy
, MM/dd/yyyy
DATETIME
ISO Instant format UTC: yyyy-MM-dd'T'HH:mm:ss.SSS'Z'