Data & Privacy Protection

The importance of data and annotation

To guarantee the quality of our speech-to-text models, the data that is used to train the models should be very similar to the data that the models will make predictions for in the future. This means that the models need to be familiar with words that frequently occur in the same context, and with words that are used in specific (health)care domains.

We therefore improve the models constantly by annotating incoming data. During the annotation process, a person reviews the transcript that was generated by our speech-to-text model and corrects it where necessary. We then use the corrected data to train and evaluate our models, which ultimately results in more accurate models.

Sensitive data

The data can only be reviewed by a person, and it may contain sensitive information, which can result in risks. That’s why we take the following measures to ensure that the data is annotated safely:

1. We do not have access to the data by default

If we would like to have access to the data for training purposes, we may only do so if a healthcare organization has explicitly given consent. If a healthcare organization has consented, we jointly decide who has access to the data and for how long the data can be retained.

2. The data of each healthcare organization is saved physically separated

We save the data of each healthcare organization in a different database. This ensures that the data is physically separated. When the retention period expires, we dispose of the entire database.

3. All agreements between us and a healthcare organization are documented in a processing agreement

We document our agreements with a healthcare organization in a processing agreement, which corresponds to the standard processing agreement that Brancheorganisaties Zorg (BoZ) created for the healthcare sector with healthcare organizations, suppliers and experts.

4. We employ our annotators directly

Our annotators are employed by us. We only do so if they have a Certificate of Conduct (Dutch: VOG) that shows they have not been involved in any information security incidents. If an annotator has access to the data of the collaborating organization, we also share their Certificate of Conduct with the healthcare organization.

5. Annotators only annotate in our own annotation environment

We created our own environment that annotators can annotate in. As a result, we are in full control of who can access the data. This annotation environment and our databases are hosted on Microsoft Azure in The Netherlands and they are therefore subject to Dutch data processing laws and regulations.

Every annotator gets their own account within our Azure Active Directory. When their employment ceases, their account is deleted, automatically revoking their access to the data. Additionally, the annotation environment can only be accessed from our own network using a VPN, and all data is fully encrypted using modern encryption techniques during transition and storage.

If you have any questions or suggestions about Attendi’s security, please contact our Privacy Officer: berend@attendi.nl.