Share this article:

Machine Learning’s Most Valuable Asset: Considerations for Monetising Data

One of the things I hear time and time again when I speak to inventors is that data is the most valuable asset when it comes to machine learning; after all, a machine learning model is only as good as the quality and quantity of data that it is trained on.

In many fields, good quality data that can be used for training ML models is proprietary and can be plagued (depending on your view-point) with privacy issues. Furthermore, as soon as you give a third-party access to your data it is copyable, making it easy to quickly completely lose control of it.

If data really is the most valuable asset, the burning question is therefore can it be successfully protected and monetised?

Here are some points for thought when protecting and monetising your data.

Be clear on what you have and each individual’s responsibilities

Conduct a data audit to determine what you have and how it is used. Work out what is in each database, who in your organisation has permission to access it and for which purposes. Just in the same way that employees should be aware when they access a trade secret, and their responsibilities that flow from that access, employees should also be made aware of their responsibilities when accessing and using company data. This minimises the prospects of your employees sharing valuable data accidentally.

Keep records on how the data was compiled and processed

Both Copyright and Database rights can be used to protect databases in the UK and the EU. Copyright protects original (e.g. creative) selections or arrangements of material in a database. However, the contents of a database are protected by database rights, if there has been a substantial investment in obtaining, verifying or presenting the data. Thus, in both cases, it is good practice to document how your data was collected and processed in order to evidence that these rights exist.

Consider patent protection

Patent protection isn’t necessarily the first thing that comes to mind when we think of protecting a database, but Art 64 EPC explicitly provides protection for products directly obtained by patentable processes. Furthermore, the EPO Guidelines state that:

where a classification method serves a technical purpose, the steps of generating the training set and training the classifier may also contribute to the technical character of the invention if they support achieving that technical purpose.”

Thus, it seems theoretically possible to obtain a patent with claims to a method of processing data (for example, to optimise the data for use in training a machine learning model), that also extends to a database produced by that process. If your data is modified in a manner so as to make an improvement on a technical process, it is therefore worth considering patent protection.

One size doesn’t necessarily fit all

Once you have audited your data collections, the next question is how valuable is each dataset to your business? Which data gives you a competitive advantage? While you may want to keep the data that contributes most to your business a Trade Secret, this may be overkill for other data assets, that provide less benefit to your organisation. It is this data that is ripe for monetisation.

Derived Products or the Real thing?

Once the data has been selected for monetisation, then a range of options are open to you, such as licensing and selling the data e.g. via an IP or data broker.

Both the data itself, and products derived from the data, such as trained models, or other predictive tools may be sold to third parties. Derived products may be hidden behind Application Programming Interfaces (APIs), and made available to third parties, e.g. using a subscriber model.

While it may feel safer to sell, or give access to derived products, it is worth noting that this doesn’t necessarily protect the underlying data, as datasets can be reconstituted from Machine Learning models. Extraction attacks, whereby an attacker makes large numbers of requests to a model, can be used to build up a database of inputs and output pairs, or to probe model boundaries to determine the underlying logic. Thus both training data and model structure can potentially be reconstructed, simply through querying a model.

As is so often the case, there doesn’t seem to be a magic bullet that allows data to be protected and easily monetised. In any case, a deliberate approach with a clear paper trail is likely to offer the best opportunity to realise the value of your datasets while also maintaining your rights.

This is for general information only and does not constitute legal advice. Should you require advice on this or any other topic then please contact or your usual Haseltine Lake Kempner advisor.

HLK bubble graphic HLK bubble graphic

Stay connected with HLK

Keep up-to-date with the latest IP insights and updates as well as upcoming webinars and seminars via HLK’s social media.