LectureBlog: Ben Shneiderman - Information Visualisation For Knowledge Discovery

Ben Shneiderman - HCIL University of Maryland

www.cs.umd.edu/hcil  ben@cs.umd.edu  @benbendc

11th June 2012, Future Interaction Lab, Swansea University. 

 

Brief History

Ben has worked on design ideas, input devices, output devices, social media, help tutorials, teaching, search and visualisations.

Pride in serving 5 Billion users with a diverse multitude of apps and interfaces.

Successfully affected the development of a wide variety of interfaces across a huge range of platforms.

 

Info Vis

Visual bandwidth is huge, and the human perception of it is remarkable

Trends, clusters, outliers are easy to spot, humans very good at recognising patterns.

A lot of big business buying visualisation companies.

Eg, Spotfire: Pioneer of software that ran realtime queries, selected data, contained multi-filters over millions of data

Used in big pharma, to identify the role of Retinol in embryos and vision.

 

Over time found multiple 2D displays better than fewer 3D - cordinated multiple displays highly useful, all 2D

100M pixels and more, spatially stable displays, arranged in meaningful relationships via proximity

EG, corporate headquarters and NASA control rooms

Smaller screens such have tablets and mobile phones have become increasingly popular too.

 

Information Visualisation Mantra

“Overview, zoom & filter, details on demand”

Show user everything first, no matter how complex and messy

Allow user to zoom and/or filter data

Then allow user to query details on demand

 

Written in paper, that then got 2k cites, and attracted lots of interest & discussion

Tried to represent a human way of navigating data similar to they way we navigate and interact with the world around us.

 

Info Vis: Data Types

SciViz; 1D Linear, 2D Map, 3D World

InfoViz; Multi-Var, Temporal, Tree, Network

Multi-variant, hugely dimesional visualisations can be difficult to create, display and use in infovis

flowingdata.com, infovis.org, infoasthetics.com, infovis.net all have some great and not so great examples of visualisations

 

Why Visualise?

Anscombe’s Quartet:

4 sets of data, each with 12 rows of x and y points. Hard to see any patterns when just tabular formatted data.

Takes a while to identify data trends, points of interest.

Very easy to see when plot on simple charts.

Also very hard to see errors in large data sets

Hospital thought its average age statistics were out, visualised data.

Only then did they notice multiple patients had an age recorded as 999 years old!

Also found other monthly data series which were missing april’s month data.

 

Some Examples and Previous Projects: Multi Var and Temporal visualisations

 

Timesearcher

V1.3 Designed for time series data, used for stocks, weather and genes

Users specified patterns, supported rapid search

Design goal: 200 periods, 5000 stocks, 100ms updates required

KD-tree, quad trees, gridfiles fall out after 6-8 dimensions

constructed using a rapid linear search

Uses the above mantra- allows users to see intial overview, and immediately identify POI.

V2.0 allowed for 10,000 points, multi-var data

Allowed controlled precision of match, tightness of fit (linear, offset, noise, match)

V3.0 includes forcasting etc.

 

Lifelines: Patient Histories

www.cs.umd.edu/hcil/lifelines

Historical medial data, visual overview of issues, updates and an idea of magnitude of each event

Lifelines 2: Contrast+Creatine

Large amount of patient data and histories- millions of people over 20 years.

Allows identification of generally slow or hard to spot patterns.

Designed around ARF;  Align, Rank and filter

Ability to align data by certain events, rank it and create sequence filters. 

 

Lifeflow: Aggregation Strategy.

www.cs.umd.edu/hcil/lifeflow

Temporal categorisation of data > data in lifelines 2 format > tree of event sequences > lifeflow aggregation

Visualisation of sequences of hospital vists- where did people go, and what happened to them

Allows identification of “bounce backs”- Patients arrive, are treated in ICU, sent to ward and then bounced back to ICU - means staff missed something.

Can align by any event - enables identification of patients who went to ward before ICU etc

 

Treemap: Gene Ontology

www.cs.umd.edu/hcil/treemap

Space filling, space limited, colour coding, size coding, but requires learning.

Practical example: www.smartmoney.com/marketmap

Provides a spatialy stable map, enables identification of differences over time, variance etc.

An example of visualisations giving you answers to questions you didn’t know you had.

newsmap.jp - example of Google news treemap

hivegroup.com - example of logistics and supply-chain treemap

Spotfire.com bond portfolio analysis, NY times, gardian all used treemaps in the past.

Voronoi tree maps in NY times piece on inflation.

Also used treemaps for vis of hard drives across multiple computers in an organisation,

shows wasted trash space, directories mirrored across loads of machines.

 

Network analysis

Visualcomplexity.com (not visual simplicity! Some great, some bad examples of network visualisations)

Discovery Process: Social Action

Network using links from US Senators that seem to vote similarly to each other.

Filtering weak links shows distint clustering between two communities- Republicans and Democrats

Shows strong, weak and middle postions within parties.

 

NodeXL

codeplex.com/nodexl

Netowork overview tool for discovery and exploration in Excel.

Also shows senate voting pattern discussed above

Allows lots of social network data to be used by people with limited programming skills

 

“Group in a box” layout: treemaps with node links

Innovation clusters: people, locations, compnies.

11k noes, 26k links

Using vis of clusters using “mouse” shows the animal, input device, mickey mouse

 

 

Summary of projects. 

Check out analysing socialmedia with NodeXL book

All work has tried to affect the world, not just restricted to minor unconnected problems

Tried to focus on UN millenium development goals - a worthy task

Q: WRT clustering algos - How can I prove that this is a sensible network?

A: nodeXL uses 3 diff clusters algos, current fashions push for multiple community membership.

Metric for clusterings are important.  Most clusters works on network connections, but should work on node value too- starting to happen

Goal of vis is insight not pretty pictures, learning better how to do it. Vis integrated with stats is the way to go, produces clues on next stats methods to use.

nodeXL goal was allow access to stats tools without programming tools.

 

Q: Have you done any work to identify and explore poor quality data sets (medical as example)?

Favourite example: Medical data showed patient admitted to hospital 14 times, but discharged twice.

Being admitted is a billing event, discharge isn’t and gets less attention.

Outliers are obvious indication of POI, either genuine, or errors.

There is a overal commercial need for cleanup of data

Sub Q: If you don’t know the truth for events, if there are more than one, how do you choose?

Use context: IE aligned medical data by heart attacks and rank by number of.

Someone had 6 attacks, but this was actually 2 attacks that had been reported 3 times from differnt people.

Make public aware of how poor medical data is!

 

Q: What would you say are the low hanging fruit in UN millenium development goals?

Approach to research has shifted dramatically. Previously promoting imperical controled studies was the heart of HCI.

Increasingly aware these don’t payout for quetions of insight.

these require much longer term programs.

Some projects require weeks & months to get used to vis tools.- 2-4 weeks with domain expert, 2-4 weeks more on their own, then to solve problem- much longer term than standard small user studies.

An answer is to find agencies, commercial entities, ask them which small problems they are having solving bigger probems, and help solve them.

Believe in breaking the typical separations of research: work should be both basic and applied, be mission driven and curiosity driven.

Getting comanies to work with you is hard, 3-4 months to get to talk to people. But resulting conversations, networking and projects are worth it.

Also the case that tools built for one problem can be equally suited to a number of other problems.

1 Notes/ Hide

  1. elsmorian posted this