How long do you store data?

From Sustainability Methods
Type Team Size
Me, Myself and I Group Collaboration The Academic System Software 1 2-10 11-30 30+

10 years. This is the most common answer I get whenever I ask someone what they think is an appropriate amount of time for data storage.

In most cases, there are no set rules for how long you should store data for — there are only rules of thumb. While 10 years might seem a very long period of time, the decision on the duration for storage of data requires some consideration. Below is a non-exhaustive list of factors that should be considered.

Legal

As data plays an increasingly important role in our day-to-day life not only as researchers but also as individuals, the issue regarding data privacy and security has become more sensitive. As such, governing bodies (of countries or union of countries) have been taking more active steps to address this issue. Specifically in EU (as of November 2019) GDPR mandates that in case of personal data only the dataset that is immediately usable is stored, and only for the duration of time that the data is used for. There are further intricacies that are important to have in mind. As such, keeping yourself aware of the data privacy and security laws of the area where you and/or your stakeholders operate in is very important.

Organizational Policy

Some organizations have a strict policy that determines how long the data should be stored for. These policies generally already account for the current legal landscape and hence make deciding how long to store data for quite straightforward — you store the data for as long as the organization mandates that you store them for.

Nature of Research and Data Set

Some researches are more sensitive than others. For examples, consider that you are performing research in medicine industry where you work with patients' data. In this case, the data you have access to, what you do with that data, and the next steps your organization takes based on your work are all incredibly sensitive. Compare that to another situation where you are an engineer that wants to learn some process and are work with simulated data. Here too, your work is important and carries real consequences in the future. However, one of the biggest differences in the two aforementioned contexts is the data involved. You would naturally have to treat data on peoples' health issues and behaviors more importantly than you would a simulated data.

Method of Storage of Data

If you can ensure that you can keep the data you use for your research private and secure for however long you have to store the data for, then adhering to the organizational policy (if available) or the 10-years-long heuristic (if no other guidelines are available) are fine. However, if you lack skills or resources to ensure data privacy and security, then you should reconsider long-term storage of data. Either you have to use services that ensure data security for you, or you will have to purge the data as soon as its purpose has been fulfilled (which is not ideal).

As mentioned earlier, there might be more factors that one needs to consider depending on the situation.


The author of this entry is Prabesh Dhakal.