Stories to TEL - Peter B. Sloep: 2012.05

To summarize briefly a previous post and article on online identities or OLIs, they should be as complete as possible to improve the quality of online networked learning, and, at the same time, they should be mimimal because of the risks involved with privacy loss.

Differential access

As a learner, you need to grant others access to your data. These others are providers of learning services. They include the delivery of personalized content, help with finding peers with whom you could collaborate on a project, suggestions for peers or paid experts who would tutor you, etc. Only through access to your data these providers can offer to you personal learning opportunities. The fewer data, the more generic their proposal, the more data, the more specific their proposal. And clearly, since personalized learning opportunities allow you to learn more effectively, efficiently and perhaps even satisfactorily, it is in your interest to provide those data. However, not everyone out there has the best of intentions. To avoid misuse of your data (to imagine what could happen, only think of the way your email address is exploited for sending you spam), you should provide nobody access to your data (see earlier post), but this would of course defeat the purpose of collecting data on you in the first place. The way out of this dilemma is that you provide differential access to your data, i.e. some parties whom you trust are allowed to see and do more than others whom you don't trust this much. This solution has a few problems.

Encryption

First, to increase data security, data need to be encrypted, not only when in transfer but also on the server that keeps them. If not, anybody who happens to spot the URL at which particular data reside could access them. A simple, one-key encryption schema, moreover, is not safe enough. Loss of the key leads to providing access to all your data to whomever happens to find that key. And loss is actually quite probable as any party who needs to access your data, needs to possess that single key. The solution is a public-private encryption scheme. In such a scheme one key is made publicly available, the other one is kept private; also, one key is used to encrypt, the other one to decrypt. If I encrypt some message with my private key, anybody who has my public key (and that is in principle everybody, since I publish the key on, say, my website) can verify that this was my message and nobody else's (authenticating). If I encrypt some message with the intended receiver's public key, I can be sure that only s/he can decipher my message, thus ensuring the message's secrecy. Only the intender receiver and nobody else can see this message. So, I could encrypt a subset of my data with the public key of some education provider A, thus ensuring that it is only they who have access to those data. If they loose my data by accident, I know it is they who are responsible for this. The same I could do for some company B, which provides assessments, or C, which is a job mediator, etc. The upshot is that public-private key encryption allows me to provide differential access to my data.

Access policies

However, the system as described is unmanageable. There are so many data about us out there, that it would rapidly overtax individual data owners to identify, store and provide access to their data, let alone keep a record of who has rights to what data. In the above, I mentioned three parties A, B and C with whom I might want to share data. In actual fact, that number is much larger and indeed incorporates companies (or individuals) I am not even acquainted with. The solution to this manageability problem is to specify sets of access policies for sets of providers. So for instance, a policy for public universities, one for private education providers, one for prospective employers, one for government agencies, etc. These policies would then have to be coupled to a public-private key pair that is related to my policy on the one hand and to the public and private key pairs of the institutions that constitute a set (otherwise I would have to start negotiating with them to issue a pair to me, which with all the users who address them would make things unimaginable for them). Also, my policies are likely to evolve over time, including some data in the data set and excluding other, including new providers in the set, excluding particular old ones. This updating should be easy to do, making use of the same public -private key pair, lest I need start negotiating again with all the educational providers that are, will be, and will not be anymore covered by my policies. These are quite complicated issues.

The above situation was the starting point of a recently published (October, 2011) piece of PhD research carried out by Luan Ibraimi at the University of Twente, Netherlands (Cryptographically Enforced Distributed Data Access Control). Please consult the original publication to see that the story is a bit more complicated than I portray it here. For those interested, the thesis of course describes in detail how these access policies may indeed be implemented

Making it work

With the technology in place to provide differential access, we're not there yet, though. The solution as described demands the collaboration of all online (educational) service providers and all parties who store personal data. These are often the same parties, although they all provide only some services and all have access to parts of your data profile. Sticking to education, the education providers need to make their public keys available and be willing to access your data through their private keys if they want to make you an offer. As it is in their interest to do so, you are a prospective customer, they would probably be willing to do so. But the story is different for those parties who already have access to you data, whether education providers (the university or school you study) or others (Facebook, LinkedIn, Google, ...). Their making your data available to others implies helping the competition! This barrier can only be taken by disallowing them to make use of your data, even if they reside on their server. That can only be done by routinely encrypting all personal data. For that they need to use your public key to ensure that you and only you can decrypt them again. It does not take a gigantic leap of the imagination to understand that these parties will not do so unless forced to do so by law or consumer pressure (see my post on Privacy and your online identity).

Regulation

There is not one single solution to this problem. Data should always be anonymised if possible. For statistical analyses it is, for personal recommendation of course it isn't. They should also be allowed automatically to degrade, the degree of which could perhaps be coupled to particular policies (cf. Harold van Heerde). However, these measures are only supportive. Any comprehensive solution as described above, is predicated on regulation. That is, online service providers who keep personal data should be forced i) to put them squarely under the user's control as for instance the Electronic Frontier Foundation argues. This includes the rights to remove data and alter them, ii) to routinely encrypt those data with the user's public key. The former brings ownership back to where it belongs, the latter allows data owners to exert ownership, for instance along the lines discussed above. Together, this would solve the problem I raised in my previous post on the topic of online learner identities. Data are brought back under their owner's control and although they remain fragmented, the differential access policies actually allow you to treat them as one large body of data.

The kind of regulation that is needed may come about voluntarily, through a process of consensus formation of all interested parties as in the creation of standards. Given the huge commercial interests I am not optimistic that this will succeed. Perhaps things could start this way, for instance at CEN/ISSS, but ultimately legislation will be needed, for instance as part of EU Commissioner Neelie Kroese's efforts for a Digital Agenda for Europe. Finally, this proposal restricts itself to creating the right conditions for users' differential control of their online data. It would clearly not be a good idea to give governmental institutions control of data themselves. They are no less a party with interests as is any company. The role of the government should be restricted to drafting legislation and enforcing it. As this is going to be long in the works, private initiatives, such as that by the QIY foundation, which operate along the lines sketched, are very welcome.

Update

After I published this text, I found out that the Leibniz Centre for Information Science at Dagstuhl, Germany had organized a Perspectives Workshop on the somewhat wider topic of Online Privacy. The abstracts of the talks have become available, but unfortunately the full manifesto hasn't yet. Nevertheless, as a Perspectives workshop is intended to be an agenda-setting meeting of experts, it illustrates the significance of the topic.

Full reference: Fischer-hübner, S., Hoofnagle, C., Rannenberg, K., Waidner, M., Krontiris, I., & Marhöfer, M. (2011). Online Privacy : Towards Informational Self-Determination on the Internet Edited by Executive Summary. Dagstuhl Manifestos, 1(1001), 1-15. doi:dx.doi.org/10.4230/DagMan.1.1.1

Stories to TEL - Peter B. Sloep

6 May 2012

Maintaining your OLI - a possible solution