A more technical approach to security & privacy
By Bart Muskala
updated 1 day ago
In this document, we highlight our efforts on different domains to ensure data security and the protection of the privacy of the individuals we track.
Users are identified via a unique ID, the adid (IDFA for Apple devices and AAID for Android users). This adid is linked to an Accurat user token and is stored in our Identity database. The users' locations are stored in a separated database. Location data - whether coordinates or processed data - is tied to the Accurat user token and never stored in combination with the users’ adid.
A user that is tracked by two different SDK setups for two different customers is tracked twice and his data is stored twice, linked with two unique Accurat user tokens to provide clear data separation for each Accurat customer.
The location data is stored using the Google Cloud Platform’s BigQuery big data pool solution. BigQuery automatically encrypts all data using AES before it is written to disk. The data is automatically decrypted when read by one of our authorized users (which is limited). Google manages the key encryption keys used to protect the data. To protect all data as it travels over the Internet during read and write operations, Google Cloud uses end-to-end encryption via Transport Layer Security. GCP is ISO/IEC 27001 compliant.
Mysql technology is used for the Identity database. This database contains i.e. the adid of the users, given (gdpr) consents and audiences they belong to.
In the near future, the most sensitive data (coordinates) will be additionally encrypted using AES. An additional key will be required to decrypt the coordinates and trace them back to actual locations for users.
For authentication we use HMAC for our API via end-to-end tls encryption. For user authentication on our admin section (which does not contain subject's PII information, only aggregated data) we have an extended user management system that flags users as inactive after 3 months of inactivity, users are always set with an expiry date on creation and can have one of several roles limiting their abilities, passwords are encrypted and are never emailed, passwords are checked for their strength etc.
OAuth and IP limitations are planned for our API as is app limitations for our SDK. To improve user management of our admin, SAML2.0 will be provided for enterprise customers. TFA (two-factor authentication) will be integrated in the admin.
Our platform is protected by Cloudflare including Advanced DDoS Protection using Cloudflare's fast, globally distributed and intelligent protection services. All traffic is encrypted using SSL/TLS to prevent data theft and other tampering. The Web Application Firewall by Cloudflare provides enhanced security through a built-in ruleset to stop a wide range of application attacks, e.g. it prevents against SQL injection, bad web robots, automated (LFI) attacks and command web trojans.
Twice a day, Security Health Analytics (a Google Cloud Platform service) monitors different GCP resources (e.g. servers, bigquery, users, discs, API key usage, TFA usage, encryption, SSL policies, potential open/public access…) to detect and report vulnerabilities in order to suggest improvements.
Our rights management policy has a 'no rights by default' approach. This entails that no rights are granted until an informed and repetitive request was made to our CTO for specific access. A limited number of users has access to actual user data (PII). For each user, his/her access rights are indicated and managed centrally.
We use TFA (two factor authentication) for access to the platforms and tools we use and have a strict 'sleep mode' policy requiring developers to logout on leaving their computers behind.
Procedures & standards
Our SDK publishing flow is a multi-faceted flow before code can be deployed (e.g. code needs to be updated on Gitlab requiring two-step authentication, a limited number of developers has access, each on a limited domain (e.g. only iOS, only Android), code commits are logged in Gitlab, additional authentication is required to deploy and only authorized versions are numbered before they can be used by customers after internal approval). A similar approach is in place for API commits between a staging and production setup.
To avoid profiling users using sensitive data, our systems have a blacklist of POIs that potentially reveal the behaviour of users with relation to their political, religious or sexual preference or their health conditions. Therefore, matches with those potentially sensitive locations are never stored.
Our policy includes guidelines about ethical conduct and the limitations of using our service for our customers who have to acknowledge their good intentions with each audience or POI created.
The number of external libraries used is limited to avoid external dependencies and are all from trusted parties (such as Google)
All data manipulations (e.g. add audience, edit campaign, delete user, group POIs, duplicate audience, ...) are logged for the admin section. All development efforts (e.g. code commits, SDK deployement, documentation deployment, ...) are logged as well.
Multiple monitoring systems are in place to control what is happening. Automated real-time systems signal any anomalies or errors to (also) identify potential fraud, malicious attempts for data manipulation and user access for both the SDK and API.
Both our SDK and API are monitored by Sentry. Stackdriver monitoring is used to monitor our data setup and algorithms on GCP.
Data storage & retention
Data is stored in the Belgian (MySql and our processing servers) and European (BigQuery) data center of Google's Google Cloud Platform (GCP) solution and is ISO/IEC 27001 compliant.
We make a distinction between storing the raw data we collect (e.g. coordinates) and storing the resulting processed data (e.g. matches, audiences). Raw data collection is stored for a shorter period of time, depending on the customer his expectations (between 30 days and 1 year). Processed data is stored for a period of 3 years.
Daily backups are created. Data collection for users is stopped once the user revokes his consent, upon the right for erasure all data is wiped from our live database configuration. A backup will still contain user data for a maximum period of 30 days before it will be wiped for good. Restoring a database will take into account revoked user consents and remove data of subjects who chose to be forgotten.
We are currently working on improved anonymisation. Although location data can be considered personal data (PII), we do store location data and user identifiers separate from each other. Favorite / frequent locations such as home & work are stored separately (see data partitioning) and all raw coordinates / data is encrypted in our BigQuery environment (see encryption).
Further efforts for anonymisation are planned and continuously ongoing with different measures we take (e.g. encryption of coordinates, on device processing).