Unique identifiers

We all have names or aliases that help identify us in our surroundings. Most of the time, a name or alias is not unique, and it doesn’t matter if it’s unique or not. In some cases though, you might want an easy way to identify you even if you change affiliation, or even name. ORCID is an example of identifiers used in the research community; a researcher registers with ORCID and is assigned a unique identifier which is then included in research publications. Anyone can then look at that identifier and look up the person in ORCID’s registry.

People are not the only ones needing identifiers: our computers have unique identifiers to help identify them on a network, our cars have vehicle identification numbers, books have ISBNs, and so on. Creating unique identifiers to identify assets, people and organisations are a key component when it comes to metadata for digital works; unique identifiers provide a way to identify a given work and its creator. An identifier which can then be used to look up information about the work in a registry, for example.

The way it works today is roughly that:

  1. You register your work in a registry (or apply to your national library or similar institution to get a ISBN in the case of ISBNs)
  2. You receive a unique identifier
  3. You put that identifier on your work, labeling it as a particular identifier (Ie., “the ISBN of this book is X”).
  4. People use that identifier in catalogues, databases, web sites, etc.

The identifiers received from a registry are guaranteed to be unique within that registry. That’s one of the reasons you can’t invent identifiers at random: they wouldn’t be guaranteed to be unique.

But how strict do we need to be? Is it enough if there’s only a 0,5% chance of someone picking the same identifier? What about 0,005%? If there was a way to generate a unique identifier without communicating with any other device, everyone could generate as many as they needed for their work or themselves. And only when they wanted to would they have to register this in a registry.

UUID is one a way of generating a practically unique identifier. It’s a 128-bit self-generated identifier which, given if there were about 70 trillion such identifiers generated world wide, the probability of a collission would be 0.00000004%. A UUID could be generated without need for communication with any other device or service, meaning that a UUID could be generated in a camera, a phone, or any other recording device at the time of recording.

If we agree that a UUID is unique enough that it’s unlikely that two people will randomly generate the same identifier, we could simplify the process of generating identifiers significantly:

  1. You generate a UUID as identifier and put this (in some cases automatically) into the work
  2. Optionally, if you want your identifier to be trusted, register it in a registry.

What are your thoughts about using UUIDs as unique identifiers?

Recent comments

Blog comments powered by Disqus