Empirical Discovery: Concept and Workflow Model

June 20th, 2014 — 12:53pm

Con­cept mod­els are a pow­er­ful tool for artic­u­lat­ing the essen­tial ele­ments and rela­tion­ships that define new or com­plex things we need to under­stand.  We’ve pre­vi­ously defined empir­i­cal dis­cov­ery as a new method, look­ing at antecedents, and also com­par­ing and con­trast­ing the dis­tinc­tive char­ac­ter­is­tics of Empir­i­cal Dis­cov­ery with other knowl­edge cre­ation and insight seek­ing meth­ods.  I’m now shar­ing our con­cept model of Empir­i­cal Dis­cov­ery, which iden­ti­fies the most impor­tant actors, activ­i­ties, and out­comes of empir­i­cal dis­cov­ery efforts, to com­ple­ment the writ­ten def­i­n­i­tion by illus­trat­ing   how the method works in practice.

Empir­i­cal dis­cov­ery con­cept model from Joe Laman­tia

In this model, we illus­trate the activ­i­ties of the three kinds of peo­ple most cen­tral to dis­cov­ery efforts: Insight Con­sumers, Data Sci­en­tists, and Data Engi­neers.  We have robust def­i­n­i­tions of all the major actors involved in dis­cov­ery (used to drive prod­uct devel­op­ment), and may share some of these var­i­ous per­sonas, pro­files, and snap­shots sub­se­quently.  For read­ing this model, under­stand Insight Con­sumers as the peo­ple who rely on insights from dis­cov­ery efforts to effect and man­age the oper­a­tions of the busi­ness.  Data Sci­en­tists are the sense­mak­ers who achieve insights, and cre­ate data prod­ucts, and ana­lyt­i­cal mod­els through dis­cov­ery efforts.  Data Engi­neers enable dis­cov­ery efforts by build­ing the enter­prise data analy­sis infra­struc­ture nec­es­sary for dis­cov­ery, and often imple­ment the out­comes of empir­i­cal dis­cov­ery by build­ing new tools based on the insights and mod­els Data Sci­en­tists create.

A key assump­tion of this model is that dis­cov­ery is by def­i­n­i­tion an iter­a­tive and serendip­i­tous method, rely­ing on fre­quent back-steps and unpre­dictable rep­e­ti­tion of activ­i­ties as a nec­es­sary aspect of how dis­cov­ery efforts unfold.  This model also assumes the data, meth­ods, and tools shift dur­ing dis­cov­ery efforts, in keep­ing with the evo­lu­tion of moti­vat­ing ques­tions, and the achieve­ment of interim out­comes.  Sim­i­larly, dis­cov­ery efforts do not always involve all of these elements.

To keep the essen­tial struc­ture and rela­tion­ships between ele­ments clear and in the fore­ground, we have not shown all of the pos­si­ble iter­a­tive loops or repeated steps.  Some closely related con­cepts are grouped together, to allow read­ing the model on two lev­els of detail.

For a sim­pli­fied view, fol­low the links between named actors and groups of con­cepts shown with col­ored back­grounds and labels.  In this read­ing, an Insight Con­sumer artic­u­lates ques­tions to a Data Sci­en­tist, who com­bines domain knowl­edge with the Empir­i­cal Dis­cov­ery Method (yel­low) to direct the appli­ca­tion of Ana­lyt­i­cal Tools (blue) and Mod­els (salmon) to Data Sets (green) drawn from Data Sources (magenta).  The Data Sci­en­tist shares Insights result­ing from dis­cov­ery efforts with the Insight Con­sumer, while Data Engi­neers may imple­ment the mod­els or data prod­ucts cre­ated by the Data Sci­en­tist by turn­ing them into tools and infra­struc­ture for the rest of the busi­ness.  For a more detailed view of the spe­cific con­cepts and activ­i­ties com­mon to Empir­i­cal dis­cov­ery efforts, fol­low the links between the indi­vid­ual con­cepts within these named groups.  (Note: there are two kinds of con­nec­tions; solid arrows indi­cat­ing def­i­nite rela­tion­ships, and for the Data Sets and Mod­els groups, dashed arrows indi­cat­ing pos­si­ble paths of evo­lu­tion.  More on this to follow)

Another way to inter­pret the two lev­els of detail in this model is as descrip­tions of for­mal vs. infor­mal imple­men­ta­tions of the empir­i­cal dis­cov­ery method.  Peo­ple and orga­ni­za­tions who take a more for­mal approach to empir­i­cal dis­cov­ery may require explic­itly defined arti­facts and activ­i­ties that address each major con­cept, such as pre­dic­tions and exper­i­men­tal results.  In less for­mal approaches, Data Sci­en­tists may implic­itly address each of the major con­cepts and activ­i­ties, such as fram­ing hypothe­ses, or track­ing the states of data sets they are work­ing with, with­out any for­mal arti­fact or deci­sion gate­way.  This sit­u­a­tional flex­i­bil­ity is follow-on of the applied nature of the empir­i­cal dis­cov­ery method, which does not require sci­en­tific stan­dards of proof and repro­ducibil­ity to gen­er­ate val­ued outcomes.

The story begins in the upper right cor­ner, when an Insight Con­sumer artic­u­lates a belief or ques­tion to a Data Sci­en­tist, who then trans­lates this moti­vat­ing state­ment into a planned dis­cov­ery effort that addresses the busi­ness goal. The Data Sci­en­tist applies the Empir­i­cal Dis­cov­ery Method (con­cepts in yel­low); pos­si­bly gen­er­at­ing a hypoth­e­sis and accom­pa­ny­ing pre­dic­tions which will be tested by exper­i­ments, choos­ing data from the range of avail­able data sources (grouped in magenta), and select­ing ini­tial ana­lyt­i­cal meth­ods con­sis­tent with the domain, the data sets (green), and the ana­lyt­i­cal or ref­er­ence mod­els (salmon) they will work with.  Given the par­tic­u­lars of the data and the ana­lyt­i­cal meth­ods, the Data Sci­en­tist employs spe­cific ana­lyt­i­cal tools (blue) such as algo­rithms and sta­tis­ti­cal or other mea­sures, based on fac­tors such as expected accu­racy, and speed or ease of use.  As the effort pro­gresses through iter­a­tions, or insights emerge, exper­i­ments may be added or revised, based on the con­clu­sions the Data Sci­en­tist draws from the results and their impact on start­ing pre­dic­tions or hypotheses.

For exam­ple, an Insight Con­sumer who works in a prod­uct man­age­ment capac­ity for an on-line social net­work with a busi­ness goal of increas­ing users’ level of engage­ment with the ser­vice wishes to iden­tify oppor­tu­ni­ties to rec­om­mend users estab­lish new con­nec­tions with other sim­i­lar and pos­si­bly known users based on unrec­og­nized affini­ties in their posted pro­files.  The data sci­en­tist trans­lates this busi­ness goal into a series of exper­i­ments inves­ti­gat­ing pre­dic­tions about which aspects of user pro­files more effec­tively pre­dict the like­li­hood of cre­at­ing new con­nec­tions in response to system-generated rec­om­men­da­tions for sim­i­lar­ity.  The Data Sci­en­tist frames exper­i­ments that rely on data from the accu­mu­lated logs of user activ­i­ties within the net­work that have been anonymized to com­ply with pri­vacy poli­cies, select­ing spe­cific work­ing sets of data to ana­lyze based on aware­ness of the shoe and nature of the attrib­utes that appear directly in users’ pro­files both across the entire net­work, and among pools of sim­i­lar but uncon­nected users. The Data Sci­en­tist plans to begin with ana­lyt­i­cal meth­ods use­ful for pre­dic­tive mod­el­ing of the effec­tive­ness of rec­om­mender sys­tems in net­work con­texts, such as mea­sure­ments of the affin­ity of users’ inter­ests based on seman­tic analy­sis of social objects shared by users within this net­work and also pub­licly in other online media, and also struc­tural or topo­log­i­cal mea­sures of rel­a­tive posi­tion and dis­tance from the field of net­work sci­ence.  The Data Sci­en­tist chooses a set of stan­dard social net­work analy­sis algo­rithms and mea­sures, com­bined with cus­tom mod­els for inter­pret­ing user activ­ity and inter­est unique to this net­work.  The Data Sci­en­tist has pre­de­fined scripts and open source libraries avail­able for ready appli­ca­tion to data (MLlib, Gephi, Weka, Pan­das, etc.) in the form of Ana­lyt­i­cal tools, which she will com­bine in sequences accord­ing to the desired ana­lyt­i­cal flow for each experiment.

The nature of ana­lyt­i­cal engage­ment with data sets varies dur­ing the course of dis­cov­ery efforts, with dif­fer­ent types of data sets play­ing dif­fer­ent roles at spe­cific stages of the dis­cov­ery work­flow.  Our con­cept map sim­pli­fies the life­cy­cle of data for pur­poses of descrip­tion, iden­ti­fy­ing five dis­tinct and rec­og­niz­able ways data are used by the Data Sci­en­tist, with five cor­re­spond­ing types of data sets.  In some cases, for­mal cri­te­ria on data qual­ity, com­plete­ness, accu­racy, and con­tent gov­ern which stage of the data life­cy­cle any  given data set is at.  In most dis­cov­ery efforts, how­ever, Data Sci­en­tists them­selves make a series of judge­ments about when and how the data in hand is suit­able for use.  The dashed arrows link­ing the five types of data sets cap­ture the approx­i­mate and con­di­tional nature of these dif­fer­ent stages of evo­lu­tion.  In prac­tice, dis­cov­ery efforts begin with explo­ration of data that may or may not be rel­e­vant for focused analy­sis, but which requires some direct engage­ment to and atten­tion to rule in or out of con­sid­er­a­tion. Focused ana­lyt­i­cal inves­ti­ga­tion of the rel­e­vant data fol­lows, made pos­si­ble by the iter­a­tive addi­tion, refine­ment and trans­for­ma­tion (wran­gling — more on this in later posts) of the exploratory data in hand.  At this stage, the Data Sci­en­tist applies ana­lyt­i­cal tools iden­ti­fied by their cho­sen ana­lyt­i­cal method.  The model build­ing stage seeks to cre­ate explicit, for­mal, and reusable mod­els that artic­u­late the pat­terns and struc­tures found dur­ing inves­ti­ga­tion.  When val­i­da­tion of newly cre­ated ana­lyt­i­cal mod­els is nec­es­sary, the Data Sci­en­tist uses appro­pri­ate data — typ­i­cally data that was not part of explicit model cre­ation.  Finally, train­ing data is some­times nec­es­sary to put mod­els into pro­duc­tion — either using them for fur­ther steps in ana­lyt­i­cal work­flows (which can be very com­plex), or in busi­ness oper­a­tions out­side the ana­lyt­i­cal context.

Because so much dis­cov­ery activ­ity requires trans­for­ma­tion of the data before or dur­ing analy­sis, there is great inter­est in the Data Sci­ence and busi­ness ana­lyt­ics indus­tries in how Data Sci­en­tists and sense­mak­ers work with data at these var­i­ous stages.  Much of this atten­tion focuses on the need for bet­ter tools for trans­form­ing data in order to make analy­sis pos­si­ble.  This model does not explic­itly rep­re­sent wran­gling as an activ­ity, because it is not directly a part of the empir­i­cal dis­cov­ery method; trans­for­ma­tion is done only as and when needed to make analy­sis pos­si­ble.  How­ever, under­stand­ing the nature of wran­gling and trans­for­ma­tion activ­i­ties is a very impor­tant topic for grasp­ing dis­cov­ery, so I’ll address in later post­ings. (We have a good model for this too…)

Empir­i­cal dis­cov­ery efforts aim to cre­ate one or more of the three types of out­comes shown in orange: insights, mod­els, and data prod­ucts.  Insights, as we’ve defined them pre­vi­ously, are dis­cov­er­ies that change people’s per­spec­tive or under­stand­ing, not sim­ply the results of ana­lyt­i­cal activ­ity, such as the end val­ues of ana­lyt­i­cal cal­cu­la­tions, the gen­er­a­tion of reports, or the retrieval and aggre­ga­tion of stored information.

One of the most valu­able out­comes of dis­cov­ery efforts is the cre­ation of exter­nal­ized mod­els that describe behav­ior, struc­ture or rela­tion­ships in clear and quan­ti­fied terms.  The mod­els that result from empir­i­cal dis­cov­ery efforts can take many forms — google ‘pre­dic­tive model’ for a sense of the tremen­dous vari­a­tion in what peo­ple active in busi­ness ana­lyt­ics con­sider to be a use­ful model — but their defin­ing char­ac­ter­is­tic is that a model always describes aspects of a sub­ject of dis­cov­ery and analy­sis that are not directly present in the data itself.  For exam­ple, if given the node and edge data iden­ti­fy­ing all of the con­nec­tions between peo­ple in the social net­work above, one pos­si­ble model result­ing from analy­sis of the net­work struc­ture is a descrip­tive read­out of the topol­ogy of the net­work as scale-free, with some set of sub­graphs, a range of node cen­tral­ity val­ues’, a matrix of pos­si­ble short­est paths between nodes or sub­graphs, etc.  It is pos­si­ble to make sense of, inter­pret, or cir­cu­late a model inde­pen­dently of the data it describes and is derived from.

Data Sci­en­tists also engage with mod­els in dis­tinct and rec­og­niz­able ways dur­ing dis­cov­ery efforts.  Ref­er­ence mod­els, deter­mined by the domain of inves­ti­ga­tion, often guide exploratory analy­sis of dis­cov­ery sub­jects by pro­vid­ing Data Sci­en­tists with gen­eral  expla­na­tions and quan­tifi­ca­tions for processes and rela­tion­ships com­mon to the domain.  And the mod­els gen­er­ated as insight and under­stand­ing accu­mu­late dur­ing dis­cov­ery evolve in stages from ini­tial artic­u­la­tion through val­i­da­tion to readi­ness for pro­duc­tion imple­men­ta­tion; which means being put into effect directly on the oper­a­tions of the business.

Data prod­ucts are best under­stood as ‘pack­ages’ of data which have util­ity for other ana­lyt­i­cal or busi­ness pur­poses, such as a list of users in the social net­work who will form new con­nec­tions in response to system-generated sug­ges­tions of other sim­i­lar users.  Data prod­ucts are not lit­er­ally fin­ished prod­ucts that the busi­ness offers for exter­nal sale or con­sump­tion.  And as back­ground, we assume oper­a­tional­iza­tion or ‘imple­men­ta­tion’ of the out­comes of empir­i­cal dis­cov­ery efforts to change the func­tion­ing of the busi­ness is the goal of dif­fer­ent busi­ness processes, such as prod­uct devel­op­ment.  While empir­i­cal dis­cov­ery focuses on achiev­ing under­stand­ing, rather than mak­ing things, this is not the only thing Data Sci­en­tists do for the busi­ness.  The clas­sic def­i­n­i­tion of Data Sci­ence as aimed at cre­at­ing new prod­ucts based on data which impact the busi­ness, is a broad man­date, and many of the posi­tion descrip­tions for data sci­ence jobs require par­tic­i­pa­tion in prod­uct devel­op­ment efforts.

Two or more kinds of out­comes are often bun­dled together as the results of a gen­uinely suc­cess­ful dis­cov­ery effort; for exam­ple, an insight that two appar­ently uncon­nected busi­ness processes are in fact related through mutual feed­back loops, and a model explic­itly describ­ing and quan­ti­fy­ing the nature of the rela­tion­ships as dis­cov­ered through analysis.

There’s more to the story, but as one trip through the essen­tial ele­ments of empir­i­cal dis­cov­ery, this is a log­i­cal point to pause and ask what might be miss­ing from this model? And how can it be improved?


Comment » | Language of Discovery

The Sensemaking Spectrum for Business Analytics: Translating from Data to Business Through Analysis

June 10th, 2014 — 8:33am

One of the most com­pelling out­comes of our strate­gic research efforts over the past sev­eral years is a grow­ing vocab­u­lary that artic­u­lates our cumu­la­tive under­stand­ing of the deep struc­ture of the domains of dis­cov­ery and busi­ness analytics.

Modes are one exam­ple of the deep struc­ture we’ve found.  After look­ing at dis­cov­ery activ­i­ties across a very wide range of indus­tries, ques­tion types, busi­ness needs, and prob­lem solv­ing approaches, we’ve iden­ti­fied dis­tinct and recur­ring kinds of sense­mak­ing activ­ity, inde­pen­dent of con­text.  We label these activ­i­ties Modes: Explore, com­pare, and com­pre­hend are three of the nine rec­og­niz­able modes.  Modes describe *how* peo­ple go about real­iz­ing insights.  (Read more about the pro­gram­matic research and for­mal aca­d­e­mic ground­ing and dis­cus­sion of the modes here: https://www.researchgate.net/publication/235971352_A_Taxonomy_of_Enterprise_Search_and_Discovery) By anal­ogy to lan­guages, modes are the ‘verbs’ of dis­cov­ery activ­ity.  When applied to the prac­ti­cal ques­tions of prod­uct strat­egy and devel­op­ment, the modes of dis­cov­ery allow one to iden­tify what kinds of ana­lyt­i­cal activ­ity a prod­uct, plat­form, or solu­tion needs to sup­port across a spread of usage sce­nar­ios, and then make con­crete and well-informed deci­sions about every aspect of the solu­tion, from high-level capa­bil­i­ties, to which spe­cific types of infor­ma­tion visu­al­iza­tions bet­ter enable these sce­nar­ios for the types of data users will analyze.

The modes are a pow­er­ful gen­er­a­tive tool for prod­uct mak­ing, but if you’ve spent time with young chil­dren, or had a really bad hang­over (or both at the same time…), you under­stand the dif­fi­cult of com­mu­ni­cat­ing using only verbs.

So I’m happy to share that we’ve found trac­tion on another facet of the deep struc­ture of dis­cov­ery and busi­ness ana­lyt­ics.  Con­tin­u­ing the lan­guage anal­ogy, we’ve iden­ti­fied some of the ‘nouns’ in the lan­guage of dis­cov­ery: specif­i­cally, the con­sis­tently recur­ring aspects of a busi­ness that peo­ple are look­ing for insight into.  We call these dis­cov­ery Sub­jects, since they iden­tify *what* peo­ple focus on dur­ing dis­cov­ery efforts, rather than *how* they go about dis­cov­ery as with the Modes.

Defin­ing the col­lec­tion of Sub­jects peo­ple repeat­edly focus on allows us to under­stand and artic­u­late sense mak­ing needs and activ­ity in more spe­cific, con­sis­tent, and com­plete fash­ion.  In com­bi­na­tion with the Modes, we can use Sub­jects to con­cretely iden­tify and define sce­nar­ios that describe people’s ana­lyt­i­cal needs and goals.  For exam­ple, a sce­nario such as ‘Explore [a Mode] the attri­tion rates [a Mea­sure, one type of Sub­ject] of our largest cus­tomers [Enti­ties, another type of Sub­ject] clearly cap­tures the nature of the activ­ity — explo­ration of trends vs. deep analy­sis of under­ly­ing fac­tors — and the cen­tral focus — attri­tion rates for cus­tomers above a cer­tain set of size cri­te­ria — from which fol­low many of the specifics needed to address this sce­nario in terms of data, ana­lyt­i­cal tools, and methods.

We can also use Sub­jects to trans­late effec­tively between the dif­fer­ent per­spec­tives that shape dis­cov­ery efforts, reduc­ing ambi­gu­ity and increas­ing impact on both sides the per­spec­tive divide.  For exam­ple, from the lan­guage of busi­ness, which often moti­vates ana­lyt­i­cal work by ask­ing ques­tions in busi­ness terms, to the per­spec­tive of analy­sis.  The ques­tion posed to a Data Sci­en­tist or ana­lyst may be some­thing like “Why are sales of our new kinds of potato chips to our largest cus­tomers fluc­tu­at­ing unex­pect­edly this year?” or “Where can inno­vate, by expand­ing our prod­uct port­fo­lio to meet unmet needs?”.  Ana­lysts trans­late ques­tions and beliefs like these into one or more empir­i­cal dis­cov­ery efforts that more for­mally and gran­u­larly indi­cate the plan, meth­ods, tools, and desired out­comes of analy­sis.  From the per­spec­tive of analy­sis this sec­ond ques­tion might become, “Which cus­tomer needs of type ‘A’, iden­ti­fied and mea­sured in terms of ‘B’, that are not directly or indi­rectly addressed by any of our cur­rent prod­ucts, offer ‘X’ poten­tial for ‘Y’ pos­i­tive return on the invest­ment ‘Z’ required to launch a new offer­ing, in time frame ‘W’?  And how do these com­pare to each other?”.  Trans­la­tion also hap­pens from the per­spec­tive of analy­sis to the per­spec­tive of data; in terms of avail­abil­ity, qual­ity, com­plete­ness, for­mat, vol­ume, etc.

By impli­ca­tion, we are propos­ing that most work­ing orga­ni­za­tions — small and large, for profit and non-profit, domes­tic and inter­na­tional, and in the major­ity of indus­tries — can be described for ana­lyt­i­cal pur­poses using this col­lec­tion of Sub­jects.  This is a bold claim, but sim­pli­fied artic­u­la­tion of com­plex­ity is one of the pri­mary goals of sense­mak­ing frame­works such as this one.  (And, yes, this is in fact a frame­work for mak­ing sense of sense­mak­ing as a cat­e­gory of activ­ity — but we’re not con­sid­er­ing the recur­sive aspects of this exer­cise at the moment.)

Com­pellingly, we can place the col­lec­tion of sub­jects on a sin­gle con­tin­uüm — we call it the Sense­mak­ing Spec­trum — that sim­ply and coher­ently illus­trates some of the most impor­tant rela­tion­ships between the dif­fer­ent types of Sub­jects, and also illu­mi­nates sev­eral of the fun­da­men­tal dynam­ics shap­ing busi­ness ana­lyt­ics as a domain.  As a corol­lary, the Sense­mak­ing Spec­trum also sug­gests inno­va­tion oppor­tu­ni­ties for prod­ucts and ser­vices related to busi­ness analytics.

The first illus­tra­tion below shows Sub­jects arrayed along the Sense­mak­ing Spec­trum; the sec­ond illus­tra­tion presents exam­ples of each kind of Sub­ject.  Sub­jects appear in col­ors rang­ing from blue to reddish-orange, reflect­ing their place along the Spec­trum, which indi­cates whether a Sub­ject addresses more the view­point of sys­tems and data (Data cen­tric and blue), or peo­ple (User cen­tric and orange).  This axis is shown explic­itly above the Spec­trum.  Anno­ta­tions sug­gest how Sub­jects align with the three sig­nif­i­cant per­spec­tives of Data, Analy­sis, and Busi­ness that shape busi­ness ana­lyt­ics activ­ity.  This ren­der­ing makes explicit the trans­la­tion and bridg­ing func­tion of Ana­lysts as a role, and analy­sis as an activity.


Sub­jects are best under­stood as fuzzy cat­e­gories [http://georgelakoff.files.wordpress.com/2011/01/hedges-a-study-in-meaning-criteria-and-the-logic-of-fuzzy-concepts-journal-of-philosophical-logic-2-lakoff-19731.pdf], rather than tightly defined buck­ets.  For each Sub­ject, we sug­gest some of the most com­mon exam­ples: Enti­ties may be phys­i­cal things such as named prod­ucts, or loca­tions (a build­ing, or a city); they could be Con­cepts, such as sat­is­fac­tion; or they could be Rela­tion­ships between enti­ties, such as the vari­ety of pos­si­ble con­nec­tions that define link­age in social net­works.  Like­wise, Events may indi­cate a time and place in the dic­tio­nary sense; or they may be Trans­ac­tions involv­ing named enti­ties; or take the form of Sig­nals, such as ‘some Mea­sure had some value at some time’ — what many enter­prises under­stand as alerts.

The cen­tral story of the Spec­trum is that though con­sumers of ana­lyt­i­cal insights (rep­re­sented here by the Busi­ness per­spec­tive) need to work in terms of Sub­jects that are directly mean­ing­ful to their per­spec­tive — such as Themes, Plans, and Goals — the work­ing real­i­ties of data (con­di­tion, struc­ture, avail­abil­ity, com­plete­ness, cost) and the chang­ing nature of most dis­cov­ery efforts make direct engage­ment with source data in this fash­ion impos­si­ble.  Accord­ingly, busi­ness ana­lyt­ics as a domain is struc­tured around the fun­da­men­tal assump­tion that sense mak­ing depends on ana­lyt­i­cal trans­for­ma­tion of data.  Ana­lyt­i­cal activ­ity incre­men­tally syn­the­sizes more com­plex and larger scope Sub­jects from data in its start­ing con­di­tion, accu­mu­lat­ing insight (and value) by mov­ing through a pro­gres­sion of stages in which increas­ingly mean­ing­ful Sub­jects are iter­a­tively syn­the­sized from the data, and recom­bined with other Sub­jects.  The end goal of  ‘lad­der­ing’ suc­ces­sive trans­for­ma­tions is to enable sense mak­ing from the busi­ness per­spec­tive, rather than the ana­lyt­i­cal perspective.

Syn­the­sis through lad­der­ing is typ­i­cally accom­plished by spe­cial­ized Ana­lysts using ded­i­cated tools and meth­ods. Begin­ning with some moti­vat­ing ques­tion such as seek­ing oppor­tu­ni­ties to increase the effi­ciency (a Theme) of ful­fill­ment processes to reach some level of prof­itabil­ity by the end of the year (Plan), Ana­lysts will iter­a­tively wran­gle and trans­form source data Records, Val­ues and Attrib­utes into rec­og­niz­able Enti­ties, such as Prod­ucts, that can be com­bined with Mea­sures or other data into the Events (ship­ment of orders) that indi­cate the work­ings of the business.

More com­plex Sub­jects (to the right of the Spec­trum) are com­posed of or make ref­er­ence to less com­plex Sub­jects: a busi­ness Process such as Ful­fill­ment will include Activ­i­ties such as con­firm­ing, pack­ing, and then ship­ping orders.  These Activ­i­ties occur within or are con­ducted by orga­ni­za­tional units such as teams of staff or part­ner firms (Net­works), com­posed of Enti­ties which are struc­tured via Rela­tion­ships, such as sup­plier and buyer.  The ful­fill­ment process will involve other types of Enti­ties, such as the prod­ucts or ser­vices the busi­ness pro­vides.  The suc­cess of the ful­fill­ment process over­all may be judged accord­ing to a sophis­ti­cated oper­at­ing effi­ciency Model, which includes tiered Mea­sures of busi­ness activ­ity and health for the trans­ac­tions and activ­i­ties included.  All of this may be inter­preted through an under­stand­ing of the oper­a­tional domain of the busi­nesses sup­ply chain (a Domain).

We’ll dis­cuss the Spec­trum in more depth in suc­ceed­ing posts.

Comment » | Big Data, Language of Discovery

Defining and Applying a Language for Discovery

May 7th, 2014 — 1:10pm

Last year, I had the plea­sure of col­lab­o­rat­ing on a paper with Tony Russell-Rose and Stephann Makri that builds on and extends our work to under­stand and artic­u­late a frame­work for dis­cov­ery needs and activ­i­ties — what we refer to as the Lan­guage of Dis­cov­ery — show­ing exam­ples of con­crete appli­ca­tion and use.

It’s been a while in com­ing, but I’m happy to say the com­plete paper ‘Defin­ing and Apply­ing a Lan­guage for Dis­cov­ery’ — is avail­able now.

I’ve repro­duced the com­plete text of the paper below, and there’s also a pdf for download.



In order to design bet­ter search expe­ri­ences, we need to under­stand the com­plex­i­ties of human information-seeking behav­iour. In this paper, we pro­pose a model of infor­ma­tion behav­iour based on the needs of users across a range of search and dis­cov­ery sce­nar­ios. The model con­sists of a set of modes that users employ to sat­isfy their infor­ma­tion goals.

We dis­cuss how these modes relate to exist­ing mod­els of human infor­ma­tion seek­ing behav­iour, and iden­tify areas where they dif­fer. We then exam­ine how they can be applied in the design of inter­ac­tive sys­tems, and present exam­ples where indi­vid­ual modes have been imple­mented in inter­est­ing or novel ways. Finally, we con­sider the ways in which modes com­bine to form dis­tinct chains or pat­terns of behav­iour, and explore the use of such pat­terns both as an ana­lyt­i­cal tool for under­stand­ing infor­ma­tion behav­iour and as a gen­er­a­tive tool for design­ing search and dis­cov­ery experiences.

1 Introduction

Clas­sic IR (infor­ma­tion retrieval) is pred­i­cated on the notion of users search­ing for infor­ma­tion in order to sat­isfy a par­tic­u­lar ‘infor­ma­tion need’. How­ever, much of what we rec­og­nize as search behav­iour is often not infor­ma­tional per se. For exam­ple, Broder [2] has shown that the need under­ly­ing a given web search could in fact be nav­i­ga­tional (e.g. to find a par­tic­u­lar site) or trans­ac­tional (e.g. through online shop­ping, social media, etc.). Sim­i­larly, Rose & Levin­son [12] have iden­ti­fied the con­sump­tion of online resources as a fur­ther com­mon cat­e­gory of search behaviour.

In this paper, we exam­ine the behav­iour of indi­vid­u­als across a range of search sce­nar­ios. These are based on an analy­sis of user needs derived from a series of cus­tomer engage­ments involv­ing the devel­op­ment of cus­tomised search applications.

The model con­sists of a set of ‘search modes’ that users employ to sat­isfy their infor­ma­tion search and dis­cov­ery goals. It extends the IR con­cept of information-seeking to embrace a broader notion of discovery-oriented prob­lem solv­ing, address­ing a wider range of infor­ma­tion inter­ac­tion and infor­ma­tion use behav­iours. The over­all struc­ture reflects Marchionini’s frame­work [8], con­sist­ing of three ‘lookup’ modes (locate, ver­ifymon­i­tor), three ‘learn’ modes (com­pare, com­pre­hendeval­u­ate) and three ‘inves­ti­gate’ modes (explore, ana­lyze, syn­the­size).

The paper is struc­tured as fol­lows. In Sec­tion 2 we dis­cuss the modes in detail and their rela­tion­ship to exist­ing mod­els of infor­ma­tion seek­ing behav­iour. Sec­tion 3 describes the data acqui­si­tion and the analy­sis process by which the modes were derived. In Sec­tion 4 we inves­ti­gate the degree to which the model scales to accom­mo­date diverse search con­texts (e.g. from consumer-oriented web­sites to enter­prise appli­ca­tions) and dis­cuss some of the ways in which user needs vary by domain. In addi­tion, we explore the ways in which modes com­bine to form dis­tinct chains or pat­terns, and reflect on the value this offers as a frame­work for express­ing com­plex pat­terns of infor­ma­tion seek­ing behaviour.

In Sec­tion 5 we exam­ine the prac­ti­cal impli­ca­tions of the model, dis­cussing how it can be applied in the design of inter­ac­tive appli­ca­tions, at both the level of indi­vid­ual modes and as com­pos­ite struc­tures. Finally, in Sec­tion 6 we reflect on the gen­eral util­ity of such mod­els and frame­works, and explore briefly the qual­i­ties that might facil­i­tate their increased adop­tion by the wider user expe­ri­ence design community.

2 Models of Information Seeking

The frame­work pro­posed in this study is influ­enced by a num­ber of pre­vi­ous mod­els. For exam­ple, Bates [1] iden­ti­fies a set of 29 search ‘tac­tics’ which she organ­ised into four broad cat­e­gories, includ­ing mon­i­tor­ing (“to keep a search on track”). Like­wise, O’Day & Jef­fries [11] exam­ined the use of infor­ma­tion search results by clients of pro­fes­sional infor­ma­tion inter­me­di­aries and iden­ti­fied three cat­e­gories of behav­iour, includ­ing mon­i­tor­ing a known topic or set of vari­ables over time and explor­ing a topic in an undi­rected fash­ion. They also observed that a given search sce­nario would often evolve into a series of inter­con­nected searches, delim­ited by trig­gers and stop con­di­tions that sig­nalled tran­si­tions between modes within an over­all scenario.

Cool & Belkin [3] pro­posed a clas­si­fi­ca­tion of inter­ac­tion with infor­ma­tion which included eval­u­ate and com­pre­hend. They also pro­posed cre­ate and mod­ify, which together reflect aspects of our syn­the­size mode.

Ellis and his col­leagues [4, 5, 6] devel­oped a model con­sist­ing of a num­ber of broad infor­ma­tion seek­ing behav­iours, includ­ing mon­i­tor­ing and ver­i­fy­ing(“check­ing the infor­ma­tion and sources found for accu­racy and errors”). In addi­tion, his brows­ing mode (“semi-directed search­ing in an area of poten­tial inter­est”) aligns with our def­i­n­i­tion of explore. He also noted that it is pos­si­ble to dis­play more than one behav­iour at any given time. In revis­it­ing Ellis’s find­ings among social sci­en­tists, Meho and Tibbo [10] iden­ti­fied analysing (although they did not elab­o­rate on it in detail). More recently, Makri et al [8] pro­posed search­ing(“for­mu­lat­ing a query in order to locate infor­ma­tion”), which reflects to our own def­i­n­i­tion of locate.

In addi­tion to the research-oriented mod­els out­lined above, we should also con­sider practitioner-oriented frame­works. Spencer [14] sug­gests four modes of infor­ma­tion seek­ing, includ­ing known-item (a sub­set of our locate mode) andexploratory (which mir­rors our def­i­n­i­tion of explore). Laman­tia [7] also iden­ti­fies four modes, includ­ing mon­i­tor­ing.

In this paper, we use the char­ac­ter­is­tics of the mod­els above as a lens to inter­pret the behav­iours expressed in a new source of empir­i­cal data. We also exam­ine the com­bi­na­to­r­ial nature of the modes, extend­ing Ellis’s [5] con­cept of mode co-occurrence to iden­tify and define com­mon pat­terns and sequences of infor­ma­tion seek­ing behaviour.

3 Studying Search Behaviour

3.1 Data Acquisition

The pri­mary source of data in this study is a set of 381 infor­ma­tion needs cap­tured dur­ing client engage­ments involv­ing the devel­op­ment of a num­ber of cus­tom search appli­ca­tions. These infor­ma­tion needs take the form of ‘micro-scenarios’, i.e. a brief nar­ra­tive that illus­trates the end user’s goal and the pri­mary task or action they take to achieve it, for example:

  • Find best offers before the oth­ers do so I can have a high margin.
  • Get help and guid­ance on how to sell my car safely so that I can achieve a good price.
  • Under­stand what is sell­ing by area/region so I can source the cor­rect stock.
  • Under­stand a portfolio’s expo­sures to assess invest­ment mix
  • Under­stand the per­for­mance of a part in the field so that I can deter­mine if I should replace it

The sce­nar­ios were col­lected as part of a series of require­ments work­shops involv­ing stake­hold­ers and customer-facing staff from var­i­ous client organ­i­sa­tions. A pro­por­tion of these engage­ments focused on consumer-oriented site search appli­ca­tions (result­ing in 277 sce­nar­ios) and the remain­der on enter­prise search appli­ca­tions (104 scenarios).

The sce­nar­ios were gen­er­ated by par­tic­i­pants in break­out ses­sions and sub­se­quently mod­er­ated by the work­shop facil­i­ta­tor in a group ses­sion to max­imise con­sis­tency and min­imise redun­dancy or ambi­gu­ity. They were also pri­ori­tised by the group to iden­tify those that rep­re­sented the high­est value both to the end user and to the client organisation.

This data pos­sesses a num­ber of unique prop­er­ties. In pre­vi­ous stud­ies of infor­ma­tion seek­ing behav­iour (e.g. [5], [10]), the pri­mary source of data has tra­di­tion­ally been inter­view tran­scripts that pro­vide an indi­rect, ver­bal account of end user infor­ma­tion behav­iours.  By con­trast, the cur­rent data source rep­re­sents a self-reported account of infor­ma­tion needs, gen­er­ated directly by end users (although a pro­por­tion were cap­tured via proxy, e.g. through cus­tomer fac­ing staff speak­ing on behalf of the end users). This change of per­spec­tive means that instead of using infor­ma­tion behav­iours to infer infor­ma­tion needs and design insights, we can adopt the con­verse approach and use the stated needs to infer infor­ma­tion behav­iours and the inter­ac­tions required to sup­port them.

More­over, the scope and focus of these sce­nar­ios rep­re­sents a fur­ther point of dif­fer­en­ti­a­tion. In pre­vi­ous stud­ies, (e.g. [8]), mea­sures have been taken to address the lim­i­ta­tions of using inter­view data by com­bin­ing it with direct obser­va­tion of infor­ma­tion seek­ing behav­iour in nat­u­ral­is­tic set­tings. How­ever, the behav­iours that this approach reveals are still bounded by the func­tion­al­ity cur­rently offered by exist­ing sys­tems and work­ing prac­tices, and as such do not reflect the full range of aspi­ra­tional or unmet user needs encom­passed by the data in this study.

Finally, the data is unique in that is con­sti­tutes a gen­uine practitioner-oriented deliv­er­able, gen­er­ated expressly for the pur­pose of design­ing and deliv­er­ing com­mer­cial search appli­ca­tions. As such, it reflects a degree of real­ism and authen­tic­ity that inter­view data or other research-based inter­ven­tions might strug­gle to replicate.

3.2 Data Analysis

These sce­nar­ios were man­u­ally ana­lyzed to iden­tify themes or modes that appeared con­sis­tently through­out the set, using a num­ber of iter­a­tions of a ‘propose-classify-refine’ cycle based on that of Rose & Levin­son [14]. Inevitably, this process was some­what sub­jec­tive, echo­ing the obser­va­tions made by Bates [1] in her work on search tactics:

While our goal over the long term may be a par­si­mo­nious few, highly effec­tive tac­tics, our goal in the short term should be to uncover as many as we can, as being of poten­tial assis­tance. Then we can test the tac­tics and select the good ones. If we go for clo­sure too soon, i.e., seek that par­si­mo­nious few pre­ma­turely, then we may miss some valu­able tac­tics.”

In this respect, the process was par­tially deduc­tive, in apply­ing the insights from exist­ing mod­els to clas­sify the data in a top-down man­ner. But it was also par­tially induc­tive, apply­ing a bottom-up, grounded analy­sis to iden­tify new types of behav­iour not present in the orig­i­nal mod­els or to sug­gest revised def­i­n­i­tions of exist­ing behaviours.

A num­ber of the sce­nar­ios focused on needs that did not involve any explicit infor­ma­tion seek­ing or use behav­iour, e.g. “Achieve a good price for my cur­rent car”. These were excluded from the analy­sis. A fur­ther num­ber were incom­plete or ambigu­ous, or were essen­tially fea­ture requests (e.g. “Have flex­i­ble nav­i­ga­tion within the page”), and were also excluded.

The process resulted in the iden­ti­fi­ca­tion of nine pri­mary search modes, which are defined below along with an exam­ple sce­nario (from the domain of consumer-oriented search):

1. LocateTo find a spe­cific (pos­si­bly known) item, e.g. “Find my read­ing list items quickly”. This mode encap­su­lates the stereo­typ­i­cal ‘find­abil­ity’ task that is so com­monly asso­ci­ated with site search. It is con­sis­tent with (but a super­set of) Spencer’s [14] known item search mode. This was the most fre­quent mode in the site search sce­nar­ios (120 instances, which con­trasts with just 2 for enter­prise search).

2. Ver­ifyTo con­firm that an item meets some spe­cific, objec­tive cri­te­rion, e.g. “See the cor­rect price for sin­gles and deals”. Often found in com­bi­na­tion with locat­ing, this mode is con­cerned with val­i­dat­ing the accu­racy of some data item, com­pa­ra­ble to that pro­posed by Ellis et al.  [5] (39 site search instances, 4 for enter­prise search).

3. Mon­i­torMain­tain aware­ness of the sta­tus of an item for pur­poses of man­age­ment or con­trol, e.g. “Alert me to new resources in my area”. This activ­ity focuses on the state of asyn­chro­nous respon­sive­ness and is con­sis­tent with that of Bates [1], O’Day and Jef­fries [11], Ellis [4], and Laman­tia [7] (13 site search instances, 17 for enter­prise search).

4. Com­pareTo iden­tify sim­i­lar­i­ties & dif­fer­ences within a set of items, e.g. “Com­pare cars that are my pos­si­ble can­di­dates in detail”. This mode has not fea­tured promi­nently in most of the pre­vi­ous mod­els (with the pos­si­ble excep­tion of Marchionini’s), but accounted for a sig­nif­i­cant pro­por­tion of enter­prise search behav­iour [13]. Although a com­mon fea­ture on many ecom­merce sites, it occurred rel­a­tively infre­quently in the site search data (2 site search instances, 16 for enter­prise search).

5. Com­pre­hendTo gen­er­ate inde­pen­dent insight by inter­pret­ing pat­terns within a data set, e.g. “Under­stand what my com­peti­tors are sell­ing”. This activ­ity focuses on the cre­ation of knowl­edge or under­stand­ing and is con­sis­tent with that of Cool & Belkin [3] and Mar­chion­ini [9] (50 site search instances, 12 for enter­prise search).

6. Eval­u­ateTo use judge­ment to deter­mine the value of an item with respect to a spe­cific goal, e.g. “I want to know whether my agency is deliv­er­ing best value”. This mode is sim­i­lar in spirit to ver­ify, in that it is con­cerned with val­i­da­tion of the data. How­ever, while ver­ify focuses on sim­ple, objec­tive fact check­ing, our con­cep­tion of eval­u­ate involves more sub­jec­tive, knowledge-based judge­ment, sim­i­lar to that pro­posed by Cool & Belkin [3] (61 site search instances, 78 for enter­prise search).

7. ExploreTo inves­ti­gate an item or data set for the pur­pose of knowl­edge dis­cov­ery, e.g. “Find use­ful stuff on my sub­ject topic”. In some ways the bound­aries of this mode are less pre­scribed than the oth­ers, but what the instances share is the char­ac­ter­is­tic of open ended, oppor­tunis­tic search and brows­ing in the spirit of O’Day and Jef­fries [11] explor­ing a topic in an undi­rected fash­ion and Spencer’s [14] exploratory (110 site search instances, 16 for enter­prise search).

8. Ana­lyzeTo exam­ine an item or data set to iden­tify pat­terns & rela­tion­ships,e.g. Ana­lyze the mar­ket so I know where my strengths and weak­nesses are”. This mode fea­tures less promi­nently in pre­vi­ous mod­els, appear­ing as a sub-component of the pro­cess­ing stage in Meho & Tibbo’s [10] model, and over­lap­ping some­what with Cool & Belkin’s [3] orga­nize. This def­i­n­i­tion is also con­sis­tent with that of Makri et al. [8], who iden­ti­fied analysing as an impor­tant aspect of lawyers’ inter­ac­tive infor­ma­tion behav­iour and defined it as “exam­in­ing in detail the ele­ments or struc­ture of the con­tent found dur­ing information-seeking.” (p. 630). This was the most com­mon ele­ment of the enter­prise search sce­nar­ios (58 site search instances, 84 for enter­prise search).

9. Syn­the­sizeTo cre­ate a novel or com­pos­ite arte­fact from diverse inputs, e.g. “I need to cre­ate a read­ing list on celebrity spon­sor­ship”. This mode also appears as a sub-component of the pro­cess­ing stage in Meho & Tibbo’s [10] model, and involves ele­ments of Cool & Belkin’s [3] cre­ate and use. Of all the modes, this one is the most com­monly asso­ci­ated with infor­ma­tion use in its broad­est sense (as opposed to infor­ma­tion seek­ing). It was rel­a­tively rare within site search (5 site search instances, 15 for enter­prise search).

Although the modes were gen­er­ated from an inde­pen­dent data source and analy­sis process, we have ret­ro­spec­tively explored the degree to which they align with exist­ing frame­works, e.g. Marchionini’s [8]. In this con­text, locate, ver­ify, andmon­i­tor could be described as lower-level ‘lookup’ modes, com­pare, com­pre­hend, and eval­u­ate as ‘learn’ modes and explore, ana­lyze, and syn­the­size as higher-level ‘inves­ti­gate’ modes.

4 Mode Sequences and Patterns

The modes defined above pro­vide an insight into the needs of users of site search and enter­prise search appli­ca­tions and a frame­work for under­stand­ing human infor­ma­tion seek­ing behav­iour. But their real value lies not so much in their occur­rence as indi­vid­ual instances but in the pat­terns of co-occurrence they reveal. In most sce­nar­ios, modes com­bine to form dis­tinct chains and pat­terns, echo­ing the tran­si­tions observed by O’Day and Jef­fries [11] and the com­bi­na­to­r­ial behav­iour alluded to by Ellis [5], who sug­gested that infor­ma­tion behav­iours can often be nested or dis­played in parallel.

Typ­i­cally these pat­terns con­sist of chains of length two or three, often with one par­tic­u­lar mode play­ing a dom­i­nant role. Site search, for exam­ple, was char­ac­ter­ized by the fol­low­ing patterns:

  1. Insight-driven search: (Explore-Analyze– Com­pre­hend): This pat­terns rep­re­sents an exploratory search for insight or knowl­edge to resolve an explicit infor­ma­tion need,  e.g. “Assess the proper mar­ket value for my car
  2. Oppor­tunis­tic search: (Explore-Locate-Evaluate): In con­trast to the explicit focus of Insight-driven search, this sequence rep­re­sents a less directed explo­ration in the prospect of serendip­i­tous dis­cov­ery e.g. “Find use­ful stuff on my sub­ject topic
  3. Qual­i­fied search (Locate-Verify) This pat­tern rep­re­sents a vari­ant of the stereo­typ­i­cal find­abil­ity task in which some ele­ment of imme­di­ate ver­i­fi­ca­tion is required, e.g. “Find trucks that I am eli­gi­ble to drive

By con­trast, enter­prise search was char­ac­ter­ized by a larger num­ber of more diverse sequences, such as:

  1. Com­par­a­tive search: (Analyze-Compare– Eval­u­ate) e.g. “Replace a prob­lem­atic part with an equiv­a­lent or bet­ter part with­out com­pro­mis­ing qual­ity and cost
  2. Exploratory search: (Explore-Analyze-Evaluate) e.g. “Iden­tify oppor­tu­ni­ties to opti­mize use of tool­ing capac­ity for my commodity/parts
  3. Strate­gic Insight (Analyze-Comprehend-Evaluate) e.g. “Under­stand a lead’s under­ly­ing posi­tions so that I can assess the qual­ity of the invest­ment oppor­tu­nity
  4. Strate­gic Over­sight (Monitor-Analyze-Evaluate) e.g. “Mon­i­tor & assess com­mod­ity sta­tus against strategy/plan/target
  5. Comparison-driven Syn­the­sis (Analyze-Compare-Synthesize) e.g. “Ana­lyze and under­stand consumer-customer-market trends to inform brand strat­egy & com­mu­ni­ca­tions plan

A fur­ther insight into these pat­terns can be obtained by pre­sent­ing them in dia­gram­matic form. Fig­ure 1 illus­trates sequences 1–3 above plus other com­monly found site search pat­terns as a net­work (with sequence num­bers shown on the arrows). It shows how cer­tain modes tend to func­tion as “ter­mi­nal” nodes, i.e. entry points or exit points for a given sce­nario. For exam­ple, Explore typ­i­cally func­tions as an open­ing, while Com­pre­hend and Eval­u­ate func­tion in clos­ing a sce­nario. Ana­lyze typ­i­cally appears as a bridge between an open­ing and clos­ing mode. The shad­ing indi­cates the mode ‘level’ alluded to ear­lier: light tones indi­cate ‘lookup’ modes, mid tones are the ‘learn’ modes, and dark tones are the ‘inves­ti­gate’ modes.

Fig. 1. Mode network for site searchFig. 1. Mode net­work for site search 

Fig­ure 2 illus­trates sequences 4–8 above plus other com­monly found pat­terns in the enter­prise search data.

Fig. 2. Mode network for enterprise searchFig. 2. Mode net­work for enter­prise search 

The pat­terns described above allow us to reflect on some of the dif­fer­ences between the needs of site search users and those of enter­prise search. Site search, for exam­ple, is char­ac­ter­ized by an empha­sis on sim­pler “lookup” behav­iours such as Locate and Ver­ify (120 and 39 instances respec­tively); modes which were rel­a­tively rare in enter­prise search (2 and 4 instances respec­tively). By con­trast, enter­prise search is char­ac­ter­ized by higher-level “learn” and “inves­ti­gate” behav­iours such as Ana­lyze and Eval­u­ate (84 and 78 instances respec­tively, com­pared to 58 and 61 for site search). Inter­est­ingly, in nei­ther case was the stereo­type of ‘search equals find­abil­ity’ borne out: even in site search (whereLocate was the most com­mon mode), known-item search was account­able for no more than a quar­ter of all instances.

But per­haps the biggest dif­fer­ence is in the com­po­si­tion of the chains: enter­prise search is char­ac­terised by a wide vari­ety of het­ero­ge­neous chains, while site searched focuses on a small num­ber of com­mon tri­grams and bigrams. More­over, the enter­prise search chains often dis­played a frac­tal nature, in which cer­tain chains were embed­ded within or trig­gered by oth­ers, to cre­ate larger, more com­plex sequences of behaviour.

5 Design Implications

Although the model offers a use­ful frame­work for under­stand­ing human infor­ma­tion seek­ing behav­iour, its real value lies in its use as a prac­ti­cal design resource. As such, it can pro­vide guid­ance on issues such as:

  • the fea­tures and func­tion­al­ity that should be avail­able at spe­cific points within a system;
  • the inter­ac­tion design of indi­vid­ual func­tions or components;
  • the design cues used to guide users toward spe­cific areas of task interface.

More­over, the model also has sig­nif­i­cant impli­ca­tions for the broader aspects of user expe­ri­ence design, such as the align­ment between the over­all struc­ture or con­cept model of a sys­tem and its users’ men­tal mod­els, and the task work­flows for var­i­ous users and con­texts. This broader per­spec­tive addresses archi­tec­tural ques­tions such as the nature of the work­spaces required by a given appli­ca­tion, or the paths that users will take when nav­i­gat­ing within a system’s struc­ture.  In this way, the modes also act as a gen­er­a­tive tool for larger, com­pos­ite design issues and structures.

5.1 Indi­vid­ual modes

On their own, each of the modes describes a type of behav­iour that may need to be sup­ported by a given infor­ma­tion system’s design. For exam­ple, an online retail site should sup­port locat­ing and com­par­ing spe­cific prod­ucts, and ide­ally alsocom­pre­hend­ing dif­fer­ences and eval­u­at­ing trade­offs between them. Like­wise, an enter­prise appli­ca­tion for elec­tronic com­po­nent selec­tion should sup­port mon­i­tor­ingand ver­i­fy­ing the suit­abil­ity of par­tic­u­lar parts, and ide­ally also ana­lyz­ing andcom­pre­hend­ing any rel­e­vant pat­terns and trends in their life­cy­cle. By under­stand­ing the antic­i­pated search modes for a given sys­tem, we can opti­mize the design to sup­port spe­cific user behav­iours. In the fol­low­ing sec­tion we con­sider indi­vid­ual instances of search modes and explore some of their design implications.


This mode encap­su­lates the stereo­typ­i­cal ‘find­abil­ity’ task that is so com­monly asso­ci­ated with site search. But sup­port for this mode can go far beyond sim­ple key­word entry. For exam­ple, by allow­ing the user to choose from a list of can­di­dates, auto-complete trans­forms the query for­mu­la­tion prob­lem from one of recall into one of recog­ni­tion (Fig­ure 3).

Figure 1: Auto-complete supports LocatingFig. 3. Auto-complete sup­ports locating 

Like­wise, Amazon’s par­tial match strat­egy deals with poten­tially failed queries by iden­ti­fy­ing the key­word per­mu­ta­tions that are likely to pro­duce use­ful results. More­over, by ren­der­ing the non-matching key­words in strikethrough text, it facil­i­tates a more informed approach to query refor­mu­la­tion (Fig­ure 4).

Figure 2: Partial matches support LocatingFig 4: Par­tial matches sup­port Locating 


In this mode, the user is inspect­ing a par­tic­u­lar item and wish­ing to con­firm that it meets some spe­cific cri­te­rion.  Google’s image results page pro­vides a good exam­ple of this (see Fig­ure 5).

Figure 3: Search result previews support verificationFig 5: Search result pre­views sup­port verification 

On mouseover, the image is zoomed in to show a mag­ni­fied ver­sion along with key meta­data, such as file­name, image size, cap­tion, and source. This allows the user to ver­ify the suit­abil­ity of a spe­cific result in the con­text of its alter­na­tives. Like­wise, there may be cases where the user needs to ver­ify a par­tic­u­lar query rather than a par­tic­u­lar result. In pro­vid­ing real-time feed­back after every key press, Google Instant sup­ports ver­i­fi­ca­tion by pre­view­ing the results that will be returned for a given query (Fig­ure 6). If the results seem unex­pected, the user can check the query for errors or try alter­na­tive spellings or key­word combinations.

Figure 4: Instant results supports verification of queriesFig 6: Instant results sup­ports ver­i­fi­ca­tion of queries 


The Com­pare mode is fun­da­men­tal to online retail, where users need to iden­tify the best option from the choices avail­able. A com­mon tech­nique is to pro­vide a cus­tom view in which details of each item are shown in sep­a­rate columns, enabling rapid com­par­i­son of prod­uct attrib­utes. Best Buy, for exam­ple, sup­ports com­par­i­son by organ­is­ing the attrib­utes into log­i­cal groups and auto­mat­i­cally high­light­ing the dif­fer­ences (Fig­ure 7).

Figure 5: Separate views support product comparisonFig 7: Sep­a­rate views sup­port prod­uct comparison 

But com­par­i­son is not restricted to qual­i­ta­tive attrib­utes. In finan­cial ser­vices, for exam­ple, it is vital to com­pare stock per­for­mance and other finan­cial instru­ments with indus­try bench­marks. Google Finance sup­ports the com­par­i­son of secu­ri­ties through a com­mon chart­ing com­po­nent (Fig­ure 8).

Figure 6: Common charts allow comparison of quantitative dataFig 8: Com­mon charts allow com­par­i­son of quan­ti­ta­tive data 


A key prin­ci­ple in explor­ing is dif­fer­en­ti­at­ing between where you are going andwhere you have already been. In fact, this dis­tinc­tion is so impor­tant that it has been woven into the fab­ric of the web itself; with unex­plored hyper­links ren­dered in blue by default, and vis­ited hyper­links shown in magenta. Ama­zon takes this prin­ci­ple a step fur­ther, through com­po­nents such as a  ‘Recent Searches’ panel show­ing the pre­vi­ous queries issued in the cur­rent ses­sion, and a ‘Recent His­tory’ panel show­ing the items recently viewed (Fig­ure 9).

Figure 7: Recent history supports explorationFig 9: Recent his­tory sup­ports exploration 

Another sim­ple tech­nique for encour­ag­ing explo­ration is through the use of “see also” pan­els. Online retail­ers com­monly use these to pro­mote related prod­ucts such as acces­sories and other items to com­ple­ment an intended pur­chase. An exam­ple of this can be seen at Food Net­work, in which fea­tured videos and prod­ucts are shown along­side the pri­mary search results (Fig­ure 10).

Figure 8: ‘See Also’ panels support explorationFig 10: ‘See Also’ pan­els sup­port exploration 

A fur­ther tech­nique for sup­port­ing explo­ration is through the use of auto-suggest. While auto-complete helps users get an idea out of their heads and into the search box, auto-suggest throws new ideas into the mix. In this respect, it helps users explore by for­mu­lat­ing more use­ful queries than they might oth­er­wise have thought of on their own. Home Depot, for exam­ple, pro­vides a par­tic­u­larly exten­sive auto-suggest func­tion con­sist­ing of prod­uct cat­e­gories, buy­ing guides, project guides and more, encour­ag­ing the dis­cov­ery of new prod­uct ideas and con­tent (Fig­ure 11).

Figure 9: Auto-suggest supports exploratory search Fig 11: Auto-suggest sup­ports exploratory search 


In modes such as explor­ing, the user’s pri­mary con­cern is in under­stand­ing theover­all infor­ma­tion space and iden­ti­fy­ing areas to ana­lyze in fur­ther detail. Analy­sis, in this sense, goes hand in hand with explor­ing, as together they present com­ple­men­tary modes that allow search to progress beyond the tra­di­tional con­fines of infor­ma­tion retrieval or ‘findability’.

A sim­ple exam­ple of this could be found at Google patents (Fig­ure 12). The alter­nate views (Cover View and List View) allow the user to switch between rapid explo­ration (scan­ning titles, brows­ing thumb­nails, look­ing for infor­ma­tion scent) and a more detailed analy­sis of each record and its metadata.

Figure 10: Alternate views support mode switching between exploration and analysisFig 12: Alter­nate views sup­port mode switch­ing between explo­ration and analysis 

In the above exam­ple the analy­sis focuses on qual­i­ta­tive infor­ma­tion derived from pre­dom­i­nantly tex­tual sources. Other appli­ca­tions focus on quan­ti­ta­tive data in the form of aggre­gate pat­terns across col­lec­tions of records. News­Sift, for exam­ple, pro­vided a set of data visu­al­iza­tions which allowed the user to ana­lyze results for a given news topic at the aggre­gate level, gain­ing an insight that could not be obtained from exam­in­ing indi­vid­ual records alone (Fig­ure 13).

Figure 11: Visualizations support analysis of quantitative informationFig 13: Visu­al­iza­tions sup­port analy­sis of quan­ti­ta­tive information 

5.2 Com­pos­ite patterns

The exam­ples above rep­re­sent instances of indi­vid­ual modes, show­ing var­i­ous ways they can be sup­ported by one or more aspects of a system’s design. How­ever, a key fea­ture of the model is its empha­sis on the com­bi­na­to­r­ial nature of modes and the pat­terns of co-occurrence this reveals [12]. In this respect, its true value is in help­ing design­ers to address more holis­tic, larger scale con­cerns such as the appro­pri­ate struc­ture, con­cept model, and orga­niz­ing prin­ci­ples of a sys­tem, as well as the func­tional and infor­ma­tional con­tent of its major com­po­nents and con­nec­tions between them.

Design at this level relies on trans­lat­ing com­pos­ite modes and chains that rep­re­sent sense-making activ­i­ties – often artic­u­lated as user jour­neys through a task and infor­ma­tion space – into inter­ac­tion com­po­nents that rep­re­sent mean­ing­ful com­bi­na­tions of infor­ma­tion and dis­cov­ery capa­bil­i­ties [13].  These com­po­nents serve as ‘build­ing blocks’ that design­ers can assem­ble into larger com­pos­ite struc­tures to cre­ate a user expe­ri­ence that sup­ports the antic­i­pated user jour­neys and aligns with their users’ men­tal mod­els [14].

The pop­u­lar micro-blogging ser­vice twitter.com pro­vides a num­ber of exam­ples of the cor­re­spon­dence between com­pos­ite modes and inter­ac­tion com­po­nents assem­bled at var­i­ous lev­els to pro­vide a coher­ent user expe­ri­ence architecture.

Header Bar

The header bar at the top of most pages of twitter.com com­bines sev­eral infor­ma­tional and func­tional ele­ments together in a sin­gle com­po­nent that sup­ports a num­ber of modes and mode chains (Fig­ure 14). It includes four dynamic sta­tus indi­ca­tors that address key aspects of twitter’s con­cept model and the users’ men­tal models:

  • the pres­ence of new tweets by peo­ple the user follows
  • inter­ac­tions with other twit­ter users such as fol­low­ing them or men­tion­ing them in a tweet
  • activ­ity related to the user’s pro­file, such as their lat­est tweets and shared media
  • peo­ple, top­ics, or items of inter­est sug­gested by the sys­tems rec­om­mender functions

These sta­tus indi­ca­tor icons update auto­mat­i­cally and pro­vide links to spe­cific pages in the twitter.com appli­ca­tion archi­tec­ture that pro­vide fur­ther detail on each area of focus. The header bar thus enables Mon­i­tor­ing of a user’s activ­ity within the full scope of the twitter.com net­work; i.e. its con­tent, mem­bers, their activ­i­ties, etc.  The header bar also enables Mon­i­tor­ing activ­ity within almost all the work­spaces that users encounter in the course of their pri­mary jour­neys throughtwitter.com.

Fig. 14. twitter.com Header BarFig. 14. twitter.com Header Bar 

The Strate­gic Over­sight chain (Mon­i­tor – Ana­lyze – Eval­u­ate) is a fun­da­men­tal sequence for twit­ter users, repeated fre­quently with dif­fer­ent aspects of the user’s pro­file. The header bar sup­ports the first step of this chain, in which users Mon­i­tor the net­work for con­tent and activ­ity of inter­est to them, and then tran­si­tion to Analy­sis and Eval­u­a­tion of that activ­ity by nav­i­gat­ing to des­ti­na­tion pages for fur­ther detail.

The header bar also includes a search box fea­tur­ing auto-complete and auto-suggest func­tion­al­ity, which pro­vides sup­port for the Qual­i­fied Search mode chain (Locate – Ver­ify). The search box also enables users to ini­ti­ate many other mode chains by sup­port­ing the Explore mode. These include Exploratory Search (Explore – Ana­lyze – Eval­u­ate), Insight-driven Search (Explore – Ana­lyze – Com­pre­hend), and Opportunity-driven Search (Explore – Locate – Eval­u­ate). All these mode chains over­lap by shar­ing a com­mon start­ing point. This is one of the most read­ily rec­og­niz­able kinds of com­po­si­tion, and often cor­re­sponds to a sin­gle instance of a par­tic­u­lar inter­ac­tion component.

The header bar includes sup­port for post­ing or Syn­the­siz­ing new tweets, reflect­ing the fact that the cre­ation of new con­tent is prob­a­bly the sec­ond most impor­tant indi­vid­ual mode (after Mon­i­tor­ing). A menu of links to admin­is­tra­tive pages and func­tions for man­ag­ing one’s twit­ter account com­pletes the con­tent of the header bar.

Indi­vid­ual Tweets

The indi­vid­ual tweets and activ­ity updates that make up the stream at the heart of the pri­mary work­space are the most impor­tant inter­ac­tion com­po­nents of the twit­ter expe­ri­ence, and their design shows a direct cor­re­spon­dence to many com­pos­ite modes and chains (Fig­ure 15). Indi­vid­ual items pro­vide the con­tent of a tweet along with the author’s pub­lic name, their twit­ter user­name, pro­file image, and the time elapsed since the tweet’s cre­ation. Together, these details allow users to Com­pare and Com­pre­hend the con­tent and sig­nif­i­cance of tweets in their own stream.  As users read more tweets and begin to rec­og­nize authors and top­ics, they can Com­pare, Ana­lyze, and Eval­u­ate them.  The indi­ca­tors of ori­gin and activ­ity allow users to Com­pare and Com­pre­hend the top­ics and inter­ests of other twit­ter users.

Fig. 15. Individual TweetFig. 15. Indi­vid­ual Tweet 

Options to invoke a num­ber of func­tions that cor­re­spond to other dis­cov­ery modes are embed­ded within the indi­vid­ual items in the stream. For exam­ple, if an update was retweeted, it is marked as such with the orig­i­nal author indi­cated and their pro­file page linked. It also shows the num­ber of times the tweet has been retweeted and favor­ited, with links that open modal pre­views of the list of users who did so. This sup­ports Mon­i­tor­ing, Explo­ration and Com­pre­hen­sion of the sig­nif­i­cance and atten­tion an indi­vid­ual tweet has received, while the links sup­port Loca­tion, Ver­i­fi­ca­tion and Mon­i­tor­ing of the other users who retweeted or favor­ited it.

Pub­lic pro­file names and user­names are linked to pages which sum­ma­rize the activ­i­ties and rela­tion­ships of the author of a tweet, enabling users to Locate and Ver­ify authors, then tran­si­tion to Mon­i­tor­ing, Explor­ing and Com­pre­hend­ing their activ­i­ties, inter­ests, and how they are con­nected to the rest of the twit­ter network.

Hash­tags are pre­sented with dis­tinct visual treat­ment.  When users click on one, it ini­ti­ates a search using the hash­tag, allow­ing users to Locate, Explore, Com­pre­hend, and Ana­lyze the topic referred to, any con­ver­sa­tions in which the tag is men­tioned, and the users who employ the tag.

Fig. 16. Expanded TweetFig. 16. Expanded Tweet 

Longer tweets are trun­cated, offer­ing an ‘Expand’ link which opens a panel dis­play­ing the num­ber of retweets and favourites and the images of the users who did so, along with the date and time of author­ing and a link to a ‘details’ page for a per­ma­nent URL that other users and exter­nal ser­vices can ref­er­ence (Fig­ure 16). This sort of trun­ca­tion enables users to more eas­ily Explore the full set of tweets in a stream and Locate indi­vid­ual items of inter­est. Con­versely, the ‘Expand’ panel allows the user to more eas­ily Explore and Com­pre­hend indi­vid­ual items.

Tweets that con­tain links to other tweets offer a ‘View tweet’ link, which opens a panel dis­play­ing the full con­tents of the orig­i­nal tweet, the date and time of post­ing, the num­ber of retweets and favorites and a pre­view list of the users who did so.  The ‘View tweet’ link thus sup­ports the Locate, Explore, and Com­pre­hend modes for indi­vid­ual updates.

Tweets that con­tain links to dig­i­tal assets such as pho­tos, videos, songs, pre­sen­ta­tions, and doc­u­ments, offer users the abil­ity to pre­view these assets directly within an expanded dis­play panel, pro­vid­ing sup­port for the Locate, Explore, and Com­pre­hend modes. These pre­views link to the source of the assets, enabling users to Locate them.  Users can also ‘flag’ media for review by twit­ter (e.g. due to vio­la­tion of poli­cies about sen­si­tive or ille­gal imagery) – which is a very spe­cific form of Evaluation.

Fig. 17. Tweet Displaying a PhotoFig. 17. Tweet Dis­play­ing a Photo 

Tweets that con­tain links to items such as arti­cles pub­lished by news­pa­pers, mag­a­zines, and jour­nals, or rec­og­nized des­ti­na­tions such as Foursquare and Google + pages, offer a ‘Sum­mary’ link (Fig­ure 17). This link opens a panel that presents the first para­graph of the arti­cle or des­ti­na­tion URL, an image from the orig­i­nal pub­lisher, and a list of users who have retweeted or favor­ited it, thus sup­port­ing Loca­tion, Explo­ration and Ver­i­fi­ca­tion of the linked item.

A text input field seeded with the author’s user­name allows users to reply to spe­cific tweets directly from an indi­vid­ual update. Users can also ‘retweet’ items directly from the list. Both func­tions are forms of Syn­the­sis, and encour­age users to cre­ate fur­ther con­tent and rela­tion­ships within the network.

Users can mark tweets as ‘favorites’ to indi­cate the impor­tance or value of these tweets to oth­ers; a clear exam­ple of the Eval­u­a­tion mode. Favorites also allow users to build a col­lec­tion of tweets curated for retrieval and inter­pre­ta­tion, enabling the Locate, Com­pare, Com­pre­hend, and Ana­lyze modes for tweets as indi­vid­ual items or as groups.

A ‘More’ link opens a menu offer­ing ‘Email Tweet’ and ‘Embed Tweet’ options, allow­ing users to ini­ti­ate tasks that take tweets out­side the twit­ter envi­ron­ment.  These two func­tions sup­port infor­ma­tion usage modes, rather than search anddis­cov­ery modes, so their dis­tinct treat­ment – invoked via a dif­fer­ent inter­ac­tion than the other func­tions – is con­sis­tent with the great empha­sis the twit­ter expe­ri­ence places on dis­cov­ery and sense mak­ing activities.

If the tweet is part of a con­ver­sa­tion, a ‘View this con­ver­sa­tion’ link allows read­ers to open a panel that presents related tweets and user activ­ity as a sin­gle thread, accom­pa­nied by a reply field.  This pro­vides sup­port for the Locate, Explore, Com­pre­hend, Ana­lyze, Eval­u­ate and Syn­the­size modes (Fig­ure 18).

Fig. 18. Tweet Showing a Conversation Fig. 18. Tweet Show­ing a Conversation 

The infor­ma­tional and func­tional con­tent pre­sented by indi­vid­ual items in their var­i­ous forms enables a num­ber of mode chains. These include Strate­gic Over­sight, in which users main­tain aware­ness of con­ver­sa­tions, top­ics, other users, and activ­i­ties; Strate­gic Insight, wherein users focus on and derive insight into con­ver­sa­tions, top­ics, and other users; and Com­par­a­tive Syn­the­sis, in which users real­ize new insights and cre­ate new con­tent through direct engage­ment with con­ver­sa­tions, top­ics, and other users.

In a man­ner sim­i­lar to the search box, this inter­ac­tion com­po­nent serves as an ini­ti­a­tion point for a num­ber of mode chains, includ­ing Exploratory Search, Insight-driven Search, and Opportunity-driven Search. Indi­vid­ual tweets thus com­bine sup­port for many impor­tant modes and mode chains into a sin­gle inter­ac­tion com­po­nent.  As a con­se­quence, they need to be rel­a­tively rich and ‘dense’, com­pact­ing much func­tion­al­ity into a sin­gle inter­ac­tion com­po­nent, but this reflects their cru­cial role in the user jour­neys that char­ac­ter­ize the twit­ter experience.

Pri­mary Work­spaces and Pages

In the pre­vi­ous sec­tion we reviewed the cor­re­spon­dence between groups of modes and the inter­ac­tion com­po­nents of a user expe­ri­ence. In this sec­tion, we review the ways in which modes and chains impact the com­po­si­tion and pre­sen­ta­tion of the next level of UX struc­ture within the sys­tem: work spaces.

The pri­mary work­spaces of twitter.com all empha­size inter­ac­tion with a stream of indi­vid­ual updates, but the focus and con­tent vary depend­ing on the con­text. On the Home page, for exam­ple, the cen­tral stream con­sists of tweets from peo­ple the user fol­lows, while on the ‘Me’ page the stream con­sists of the tweets cre­ated by the user (Fig­ure 19). How­ever, the lay­out of these pages remains con­sis­tent: the work­space is dom­i­nated by a sin­gle cen­tral stream of indi­vid­ual updates. The pri­mary inter­ac­tion mode for this stream is Mon­i­tor­ing, evi­dent from the count of new items added to the net­work since the last page refresh.

Fig. 19. twitter.com Home WorkspaceFig. 19. twitter.com Home Workspace 

The place­ment of the header bar at the top of all of the pri­mary work­spaces is a design deci­sion that reflects the pri­macy of Mon­i­tor­ing as a mode of engage­ment with the twit­ter ser­vice; sup­port­ing its role as a per­sis­tent ‘back­ground’ mode of dis­cov­ery inde­pen­dent of the user’s cur­rent point in a task or jour­ney, and its role as a com­mon entry point to the other mode chains and user journeys.

The con­sis­tent place­ment of the ‘Com­pose new Tweet’ con­trol in upper right cor­ner of the work­space reflects known inter­ac­tion design prin­ci­ples (cor­ners are the sec­ond most eas­ily engaged areas of a screen, after the cen­tre) and the under­stand­ing that Syn­the­sis is the sec­ond most impor­tant sin­gle mode for the twit­ter service.

The con­tent of the indi­vid­ual updates attracts and retains users’ atten­tion very effec­tively: the major­ity of the actions a user may want to take in regard to a tweet (or any of the related con­structs in twitter’s con­cept model such as con­ver­sa­tions, hash tags, pro­files, linked media, etc.) are directly avail­able from the inter­ac­tion com­po­nent.  In some cases, these actions are pre­sented via modal or light­box pre­view, wherein the user’s focus is ‘forced’ onto a sin­gle ele­ment – thus main­tain­ing the pri­macy of the stream.  In oth­ers, links lead to des­ti­na­tion pages that switch the user’s focus to a dif­fer­ent sub­ject – another user’s pro­file, for exam­ple – but in most of these cases the struc­ture of the work­space remains con­sis­tent: a two col­umn body sur­mounted by the ubiq­ui­tous header bar. There is lit­tle need to look else­where in the work­space, unless the user needs to check the sta­tus of one of the broader aspects of their account, at which point the header bar pro­vides appro­pri­ate func­tion­al­ity as dis­cussed above.

The absence of a page footer – scrolling is ‘infi­nite’ on the pri­mary pages oftwitter.com – reflects the con­scious deci­sion to con­vey updates as an end­less, dynamic stream.  This encour­ages users to con­tinue scrolling, increas­ing Explo­ration activ­ity, and enhanc­ing users’ Com­pre­hen­sion of addi­tional updates – which ben­e­fits twitter’s busi­ness by increas­ing the atten­tion users direct toward the service.

Although the two-tier, stream-centred struc­ture of twitter’s pri­mary work­spaces remains con­sis­tent, there are vari­a­tions in the com­po­si­tion of the left col­umn (Fig­ure 20). On the Home page, for exam­ple, the left col­umn offers four sep­a­rate com­po­nents. The first is a sum­mary of the user’s pro­file, includ­ing a pro­file image, a link to their pro­file page, counts of their tweets, fol­low­ers, and the peo­ple they fol­low, and a ‘com­pose new tweet’ box.  This is another exam­ple of a com­po­nent sup­port­ing a com­pos­ite of modes.

Fig. 20. Twitter Home Page - Left ColumnFig. 20. Twit­ter Home Page – Left Column 

The core pur­pose is to enable users to Mon­i­tor the most impor­tant aspects of their own account via the counts.  The links pro­vide direct Locate func­tion­al­ity for fol­low­ers, tweets, and accounts the user fol­lows; and also serve as a point of depar­ture for the same mode chains that can be ini­ti­ated from the header bar.  The ‘com­pose new tweet’ func­tion encour­ages users to cre­ate updates, under­lin­ing the impor­tance of Syn­the­sis as the source of new con­tent within the twit­ter network.

User Expe­ri­ence Architecture

The twitter.com expe­ri­ence is intended to sup­port a set of user jour­neys con­sist­ing largely of search and dis­cov­ery tasks which cor­re­spond with spe­cific mon­i­tor­ing and search-related mode chains. Fur­ther, we can see that pat­terns of recur­rence, inter­sec­tion, over­lap, and sequenc­ing in the aggre­gate set of search and dis­cov­ery modes are sub­stan­tially reflected in twitter’s user expe­ri­ence architecture.

From a struc­tural design per­spec­tive, the core [16] of the twitter.com user expe­ri­ence archi­tec­ture is a set of four inter­ac­tion con­soles, each of which focuses on mon­i­tor­ing a dis­tinct stream of updates around the most impor­tant facets of thetwitter.com con­cept model: the con­tent and activ­i­ties of peo­ple in the user’s per­sonal net­work (Home); inter­ac­tions with other users (Inter­ac­tions); the user’s pro­file (@Me); and a digest of con­tent from all users in the twitter.com net­work (Dis­cover) (Fig­ure 21).

The core mon­i­tor­ing con­soles are sup­ported by screens that assist and encour­age users to expand their per­sonal net­works through loca­tion and explo­ration tools; these include ‘Find friends’, ‘Who to fol­low’ ‘Browse cat­e­gories’, and the search results page.

Fig. 21. Twitter.com Discover WorkspaceFig. 21. Twitter.com Dis­cover Workspace 

Spe­cific land­ing pages pro­vide mon­i­tor­ing and cura­tion tools for the dif­fer­ent types of rela­tion­ships users can estab­lish in the social graph: fol­low and un-follow, fol­low­ers and fol­low­ing, pub­lic and pri­vate accounts, list mem­ber­ships, etc.  A small set of screens pro­vides func­tion­al­ity for admin­is­ter­ing the user’s account, such as ‘Settings’.

Under­ly­ing this user expe­ri­ence archi­tec­ture is a con­cept model con­sist­ing pri­mar­ily of a small set of social objects – tweets, con­ver­sa­tions, pro­files, shared dig­i­tal assets, and lists thereof – linked together by search and dis­cov­ery verbs. A rel­a­tively sim­ple infor­ma­tion archi­tec­ture estab­lishes the set of cat­e­gories used to iden­tify these objects by topic, sim­i­lar­ity, and con­tent (Fig­ure 22).

In its holis­tic and gran­u­lar aspects, the twit­ter user expe­ri­ence archi­tec­ture aligns well with users’ men­tal mod­els for build­ing a pro­file and par­tic­i­pat­ing in an ongo­ing stream of con­ver­sa­tions. How­ever, what emerges quite quickly from analy­sis of the twit­ter con­cept model and user expe­ri­ence archi­tec­ture is the role of search and dis­cov­ery modes in both atomic and com­pos­ite forms at every level of twitter’s design. Rather than merely sub­sum­ing modes as part of some larger activ­ity, many of the most com­mon actions users can take with twitter’s core inter­ac­tion objects cor­re­spond directly to modes themselves.

Fig. 22. Twitter.com User Experience ArchitectureFig. 22. Twitter.com User Expe­ri­ence Architecture 

The indi­vid­ual tweet com­po­nent is a prime exam­ple: the sum­maries of author pro­files and their recent activ­ity are a com­pos­ite of the Locate, Explore and Com­pre­hend modes (Fig­ure 23). Evi­dently, the pre­sen­ta­tion, labelling, and inter­ac­tion design may reflect adap­ta­tions spe­cific to the lan­guage and men­tal model of the twit­ter envi­ron­ment, but the activ­i­ties are clearly rec­og­niz­able. The ‘Show con­ver­sa­tion’ func­tion dis­cussed above also reflects direct sup­port to Locate, Explore and Com­pre­hend a con­ver­sa­tion object as a sin­gle interaction.

Fig. 23. Twitter Profile SummaryFig. 23. Twit­ter Pro­file Summary 

Because the twitter.com expe­ri­ence is so strongly cen­tred on sense-making, search and dis­cov­ery modes often directly con­sti­tute the activ­ity paths con­nect­ing one object to another within the user expe­ri­ence archi­tec­ture.  In this sense, the modes and chains could be said to act as a ‘skele­ton’ for twitter.com, and are directly vis­i­ble to an unprece­dented degree in the inter­ac­tion design built on that skeleton.

6 Discussion

The model described in this paper encom­passes a range of infor­ma­tion seek­ing behav­iours, from ele­men­tary lookup tasks through to more com­plex problem-solving activ­i­ties. How­ever, the model could also be framed as part of a broader set of infor­ma­tion behav­iours, extend­ing from ‘acqui­si­tion’ ori­ented tasks at one end of the spec­trum to ‘usage’ ori­ented activ­i­ties at the other (Fig­ure 24). In this con­text, modes can span more than one phase. For exam­ple, Explore entails a degree ofinter­ac­tion cou­pled with the antic­i­pa­tion of fur­ther dis­cov­ery, i.e. acqui­si­tion.  Like­wise, Eval­u­ate implies a degree of inter­ac­tion in the pur­suit of some higher goal or pur­pose to which the out­put will be put, i.e. usage.

It would appear that with the pos­si­ble excep­tion of syn­the­size, there are no exclu­sively usage-oriented behav­iours in the model. This may sug­gest that the model is in some senses incom­plete, or may sim­ply reflect the con­text in which the data was acquired and the IR-centric processes by which it was analysed.

Reduc­ing the ‘scope’ of the model such that modes serve only as descrip­tors of dis­tilled sense-making activ­ity inde­pen­dent of con­text (such as the user’s over­all goal and the nature of the infor­ma­tion assets involved) may help clar­ify the rela­tion­ship between acqui­si­tion, inter­ac­tion and usage phases. In this per­spec­tive, there appears to be a form of ‘par­al­lelism’ in effect; with users simul­ta­ne­ously under­tak­ing activ­i­ties focused on an over­all goal, such as Eval­u­at­ing the qual­ity of a finan­cial instru­ment, while also per­form­ing activ­i­ties focused on nar­rower information-centred objec­tives such as Locat­ing and Ver­i­fy­ing the util­ity of the infor­ma­tion assets nec­es­sary for them to com­plete the Eval­u­a­tion.  These ‘par­al­lel’ sets of activ­i­ties – one focused on infor­ma­tion assets in ser­vice to a larger goal, and the other focused on the goal itself – can be use­fully described in terms of modes, and what is more impor­tant, seem inter­twined in the minds of users as they artic­u­late their dis­cov­ery needs.

Fig. 24. From information acquisition to information useFig. 24. From infor­ma­tion acqui­si­tion to infor­ma­tion use 

A key fea­ture of the cur­rent model is its empha­sis on the com­bi­na­to­r­ial nature of search modes, and the value this offers as a frame­work for express­ing com­plex pat­terns of behav­iour. Evi­dently, such an approach is not unique: Makri (2008), for exam­ple, has also pre­vi­ously explored the con­cept of mode chains to describe infor­ma­tion seek­ing behav­iours observed in nat­u­ral­is­tic set­tings. How­ever, his approach was based on the analy­sis of com­plex tasks observed in real time, and as such was less effec­tive in reveal­ing con­sis­tent pat­terns of atomic behav­iour such as those found in the cur­rent study.

Con­versely, this virtue can also be a short­com­ing: the fact that sim­ple repeat­ing pat­terns can be extracted from the data may be as much an arte­fact of the medium as it is of the infor­ma­tion needs it con­tains. These sce­nar­ios were expressly designed to be a con­cise, self-contained deliv­er­able in their own right, and applied as a sim­ple but effec­tive tool in the plan­ning and pri­ori­ti­sa­tion of soft­ware devel­op­ment activ­i­ties. This places a limit on the length and sophis­ti­ca­tion of the infor­ma­tion needs they encap­su­late, and a nat­ural bound­ary on the scope and extent of the pat­terns they rep­re­sent. Their for­mat also allows a researcher to apply per­haps an unre­al­is­tic degree of top-down judge­ment and iter­a­tion in align­ing the rel­a­tive gran­u­lar­ity of the infor­ma­tion needs to exist­ing modes; a ben­e­fit that is less read­ily avail­able to those whose approach involves real-time, obser­va­tional data.

A fur­ther caveat is that in order to progress from under­stand­ing an infor­ma­tion need to iden­ti­fy­ing the infor­ma­tion behav­iours required to sat­isfy those needs, it is nec­es­sary to spec­u­late on the behav­iours that a user might per­form when under­tak­ing a task to sat­isfy the need. It may tran­spire that users actu­ally per­form dif­fer­ent behav­iours which achieve the same end, or per­form the expected behav­iour but through a com­bi­na­tion of other nested behav­iours, or may sim­ply sat­isfy the need in a way that had not been envis­aged at all.

Evi­dently, the process of infer­ring infor­ma­tion behav­iour from self-reported needs can never be wholly deter­min­is­tic, regard­less of the con­sis­tency mea­sures dis­cussed in Sec­tion 3.1. In this respect, fur­ther steps should be taken to oper­a­tional­ize the process and develop some inde­pen­dent mea­sure of sta­bil­ity or objec­tiv­ity in its usage, so that its value and insights can extend reli­ably to the wider research community.

The com­po­si­tional behav­iour of the modes sug­gests fur­ther open ques­tions and avenues for research. One of these is the nature of com­po­si­tion­al­ity itself: one the one hand it could be thought of as a pseudo-linguistic gram­mar, with bigrams and tri­grams of modes that com­bine in turn to form larger sequences, anal­o­gous to coher­ent “sen­tences”. In this con­text, the modes act as verbs, while the asso­ci­ated objects (users, infor­ma­tion assets, processes etc.) become the nouns. The occur­rence of dis­tinct ‘open­ing’ and ‘clos­ing’ modes in the sce­nar­ios would seem to fur­ther sup­port this view. How­ever, in some sce­nar­ios the tran­si­tions between the modes are far less appar­ent, and instead they could be seen as apply­ing in par­al­lel, like notes com­bin­ing in har­mony to form a musi­cal chord. In both cases, the degree and nature of any such com­po­si­tional rules needs fur­ther empir­i­cal inves­ti­ga­tion. This may reveal other depen­den­cies yet to be observed, such as the pos­si­bil­ity alluded to ear­lier of higher-level behav­iours requir­ing the com­ple­tion of cer­tain lower level modes before they them­selves can terminate.

The process of map­ping from modes to design inter­ven­tions also reveals fur­ther obser­va­tions on the util­ity of infor­ma­tion mod­els in gen­eral. Despite their evi­dent value as ana­lyt­i­cal frame­works and their pop­u­lar­ity among researchers (Bates’ Berryp­ick­ing model has been cited over 1,000 times, for exam­ple), few have gained sig­nif­i­cant trac­tion within the design com­mu­nity, and fewer still are adopted as part of the main­stream work­ing prac­tices of sys­tem design practitioners.

In part, this may be sim­ply a reflec­tion of imper­fect chan­nels of com­mu­ni­ca­tion between the research and design com­mu­ni­ties. How­ever, it may also reflect a grow­ing con­cep­tual gap between research insights on the one hand and cor­re­spond­ing design inter­ven­tions on the other. It is likely that the most valu­able the­o­ret­i­cal mod­els will need to strike a bal­ance between flex­i­bil­ity (the abil­ity to address a vari­ety of domains and prob­lems), gen­er­a­tive power (the abil­ity to express com­plex pat­terns of behav­iour) and an appro­pri­ate level of abstrac­tion (such that design insights are read­ily avail­able; or may be inferred with min­i­mal speculation).

7 Conclusions

In this paper, we have exam­ined the needs and behav­iours of indi­vid­u­als across a wide range of search and dis­cov­ery sce­nar­ios. We have pro­posed a model of infor­ma­tion seek­ing behav­iour which has at its core a set of modes that peo­ple reg­u­larly employ to sat­isfy their infor­ma­tion needs. In so doing, we explored a novel, goal-driven approach to elic­it­ing user needs, and iden­ti­fied some key dif­fer­ences in user behav­iour between site search and enter­prise search.

In addi­tion, we have demon­strated the value of the model as a frame­work for express­ing com­plex pat­terns of search behav­iour, extend­ing the IR con­cept of information-seeking to embrace a broader range of infor­ma­tion inter­ac­tion and use behav­iours. We pro­pose that our approach can be adopted by other researchers who want to adopt a ‘needs first’ per­spec­tive to under­stand­ing infor­ma­tion behaviour.

By illus­trat­ing ways in which indi­vid­ual modes are sup­ported in exist­ing search appli­ca­tions, we have made a prac­ti­cal con­tri­bu­tion that helps bridge the gap between inves­ti­gat­ing search behav­iour and design­ing appli­ca­tions to sup­port such behav­iour. In par­tic­u­lar, we have demon­strated how modes can serve as an effec­tive design tool across var­ied lev­els of sys­tem design: con­cept model, UX archi­tec­ture, inter­ac­tion design, and visual design.


  1. Bates, Mar­cia J. 1979. Infor­ma­tion Search Tac­tics. Jour­nal of the Amer­i­can Soci­ety for Infor­ma­tion Sci­ence 30, 205–214.
  2. Cool, C. & Belkin, N. 2002. A clas­si­fi­ca­tion of inter­ac­tions with infor­ma­tion. In H. Bruce (Ed.), Emerg­ing Frame­works and Meth­ods: CoLIS4: pro­ceed­ings of the 4th Inter­na­tional Con­fer­ence on Con­cep­tions of Library and Infor­ma­tion Sci­ence, Seat­tle, WA, USA, July 21–25, 1–15.
  3. Ellis, D.  1989.  A Behav­ioural Approach to Infor­ma­tion Retrieval Sys­tem Design.  Jour­nal of Doc­u­men­ta­tion, 45(3), 171–212.
  4. Ellis, D., Cox, D. & Hall, K. 1993.  A Com­par­i­son of the Information-seeking Pat­terns of Researchers in the Phys­i­cal and Social Sci­ences.  Jour­nal of Doc­u­men­ta­tion 49(4), 356–369.
  5. Ellis, D. & Hau­gan, M. 1997.  Mod­el­ling the Information-seeking Pat­terns of Engi­neers and Research Sci­en­tists in an Indus­trial Envi­ron­ment.  Jour­nal of Doc­u­men­ta­tion 53(4), pp. 384–403.
  6. Hobbs, J. (2005) An intro­duc­tion to user jour­neys. Boxes & Arrows. [Avail­able: http://www.boxesandarrows.com/an-introduction-to-user-journeys/
  7. Kalbach, J. (2012). Design­ing Screens Using Cores and Paths. Boxes & Arrows. [Avail­able:http://www.boxesandarrows.com/designing-screens-using-cores-and-paths/
  8. Laman­tia, J. 2006. 10 Infor­ma­tion Retrieval PatternsJoeLamantia.com. [Avail­able:http://www.joelamantia.com/information-architecture/10-information-retrieval-patterns.
  9. Laman­tia, J. (2009). Cre­at­ing Suc­cess­ful Por­tals with a Design Frame­work. Inter­na­tional Jour­nal of Web Por­tals (IJWP), 1(4), 63–75. doi:10.4018/jwp.2009071305
  10. Makri, S., Bland­ford, A. & Cox, A.L. 2008.  Inves­ti­gat­ing the Information-Seeking Behav­iour of Aca­d­e­mic Lawyers:  From Ellis’s Model to Design. Infor­ma­tion Pro­cess­ing and Man­age­ment 44(2), 613–634.
  11. Mar­chion­ini, G. 2006. Exploratory search: from find­ing to under­stand­ing. Com­mu­ni­ca­tions of the ACM 49(4), 41–46.
  12. Meho, L. & Tibbo, H. 2003.  Mod­el­ing the Information-seeking Behav­ior of Social Sci­en­tists: Ellis’s Study Revis­ited.  Jour­nal of the Amer­i­can Soci­ety for Infor­ma­tion Sci­ence and Tech­nol­ogy 54(6), 570–587.
  13. O’Day, V. & Jef­fries, R. 1993. Ori­en­teer­ing in an Infor­ma­tion Land­scape: How Infor­ma­tion Seek­ers get from Here to There.INTERCHI 1993, 438–445.
  14. Rose, D. and Levin­son, D. 2004. Under­stand­ing user goals in web search, Pro­ceed­ings of the 13th inter­na­tional con­fer­ence on World Wide Web, New York, NYUSA
  15. Russell-Rose, T., Laman­tia, J. and Bur­rell, M. 2011. A Tax­on­omy of Enter­prise Search and Dis­cov­ery. Pro­ceed­ings of HCIR 2011,Cal­i­for­nia, USA.
  16. Russell-Rose, T. and Makri, S. (2012). A Model of Con­sumer Search Behav­ior. Pro­ceed­ings of Euro­HCIR 2012, Nijmegen, Nether­lands.
  17. Spencer, D. 2006. Four Modes of Seek­ing Infor­ma­tion and How to Design for Them. Boxes & Arrows. [Avail­able:www.boxesandarrows.com/view/four_modes_of_seeking_information_and_how_to_design_for_them


Comment » | Information Architecture, Language of Discovery, User Research

Data Science Highlights: An Investigation of the Discipline

March 28th, 2014 — 1:26pm

I’ve posted a sub­stan­tial read­out sum­ma­riz­ing some of the more salient find­ings from a long-running pro­gram­matic research pro­gram into data sci­ence. This deck shares syn­the­sized find­ings around many of the facets of data sci­ence as a dis­ci­pline, includ­ing prac­tices, work­flow, tools, org mod­els, skills, etc. This read­out dis­tills a very wide range of inputs, includ­ing; direct inter­views, field-based ethnog­ra­phy, com­mu­nity par­tic­i­pa­tion (real-world and on-line), sec­ondary research from indus­try and aca­d­e­mic sources, analy­sis of hir­ing and invest­ment activ­ity in data sci­ence over sev­eral years, descrip­tive and def­i­n­i­tional arti­facts authored by prac­ti­tion­ers / ana­lysts / edu­ca­tors, and other exter­nal actors, media cov­er­age of data sci­ence, his­tor­i­cal antecedents, the struc­ture and evo­lu­tion of pro­fes­sional dis­ci­plines, and even more.

I con­sider it a sort of business-anthropology-style inves­ti­ga­tion of data sci­ence, con­ducted from the view­point of prod­uct making’s pri­mary aspects; strat­egy, man­age­ment, design, and delivery.

I learned a great deal dur­ing the course of this effort, and expect to con­tinue to learn, as data sci­ence will con­tinue to evolve rapidly for the next sev­eral years.

Data sci­ence prac­ti­tion­ers look­ing at this mate­r­ial are invited to pro­vide feed­back about where these mate­ri­als are accu­rate or inac­cu­rate, and most espe­cially about what is miss­ing, and what is com­ing next for this very excit­ing field.





Comment » | Big Data, User Research

Data Science and Empirical Discovery: A New Discipline Pioneering a New Analytical Method

March 26th, 2014 — 11:00am

One of the essen­tial pat­terns of sci­ence and indus­try in the mod­ern era is that new meth­ods for under­stand­ing — what I’ll call sense­mak­ing from now on — often emerge hand in hand with new pro­fes­sional and sci­en­tific dis­ci­plines.  This link­age between new dis­ci­plines and new meth­ods fol­lows from the  decep­tively sim­ple imper­a­tive to real­ize new types of insight, which often means analy­sis of new kinds of data, using new tech­niques, applied from newly defined per­spec­tives. New view­points and new ways of under­stand­ing are lit­er­ally bound together in a sort of symbiosis.

One famil­iar exam­ple of this dynamic is the rapid devel­op­ment of sta­tis­tics dur­ing the 18th and 19th cen­turies, in close par­al­lel with the rise of new social sci­ence dis­ci­plines includ­ing eco­nom­ics (orig­i­nally polit­i­cal econ­omy) and soci­ol­ogy, and nat­ural sci­ences such as astron­omy and physics.  On a very broad scale, we can see the pat­tern in the tan­dem evo­lu­tion of the sci­en­tific method for sense­mak­ing, and the cod­i­fi­ca­tion of mod­ern sci­en­tific dis­ci­plines based on pre­cur­sor fields such as nat­ural his­tory and nat­ural phi­los­o­phy dur­ing the sci­en­tific rev­o­lu­tion.

Today, we can see this pat­tern clearly in the simul­ta­ne­ous emer­gence of Data Sci­ence as a new and dis­tinct dis­ci­pline accom­pa­nied by Empir­i­cal Dis­cov­ery, the new sense­mak­ing and analy­sis method Data Sci­ence is pio­neer­ing.  Given its dra­matic rise to promi­nence recently, declar­ing Data Sci­ence a new pro­fes­sional dis­ci­pline should inspire lit­tle con­tro­versy. Declar­ing Empir­i­cal Dis­cov­ery a new method may seem bolder, but when we with the essen­tial pat­tern of new dis­ci­plines appear­ing in tan­dem with new sense­mak­ing meth­ods in mind, it is more con­tro­ver­sial to sug­gest Data Sci­ence is a new dis­ci­pline that lacks a cor­re­spond­ing new method for sense­mak­ing.  (I would argue it is the method that makes the dis­ci­pline, not the other way around, but that is a topic for fuller treat­ment elsewhere)

What is empir­i­cal dis­cov­ery?  While empir­i­cal dis­cov­ery is a new sense­mak­ing method, we can build on two exist­ing foun­da­tions to under­stand its dis­tin­guish­ing char­ac­ter­is­tics, and help craft an ini­tial def­i­n­i­tion.  The first of these is an under­stand­ing of the empir­i­cal method. Con­sider the fol­low­ing description:

The empir­i­cal method is not sharply defined and is often con­trasted with the pre­ci­sion of the exper­i­men­tal method, where data are derived from the sys­tem­atic manip­u­la­tion of vari­ables in an exper­i­ment.  …The empir­i­cal method is gen­er­ally char­ac­ter­ized by the col­lec­tion of a large amount of data before much spec­u­la­tion as to their sig­nif­i­cance, or with­out much idea of what to expect, and is to be con­trasted with more the­o­ret­i­cal meth­ods in which the col­lec­tion of empir­i­cal data is guided largely by pre­lim­i­nary the­o­ret­i­cal explo­ration of what to expect. The empir­i­cal method is nec­es­sary in enter­ing hith­erto com­pletely unex­plored fields, and becomes less purely empir­i­cal as the acquired mas­tery of the field increases. Suc­cess­ful use of an exclu­sively empir­i­cal method demands a higher degree of intu­itive abil­ity in the practitioner.”

Data Sci­ence as prac­ticed is largely con­sis­tent with this pic­ture.  Empir­i­cal pre­rog­a­tives and under­stand­ings shape the pro­ce­dural plan­ning of Data Sci­ence efforts, rather than the­o­ret­i­cal con­structs.  Semi-formal approaches pre­dom­i­nate over explic­itly cod­i­fied meth­ods, sig­nal­ing the impor­tance of intu­ition.  Data sci­en­tists often work with data that is on-hand already from busi­ness activ­ity, or data that is newly gen­er­ated through nor­mal busi­ness oper­a­tions, rather than seek­ing to acquire wholly new data that is con­sis­tent with the design para­me­ters and goals of for­mal exper­i­men­tal efforts.  Much of the sense­mak­ing activ­ity around data is explic­itly exploratory (what I call the ‘pan­ning for gold’ stage of evo­lu­tion — more on this in sub­se­quent post­ings), rather than sys­tem­atic in the manip­u­la­tion of known vari­ables.  These exploratory tech­niques are used to address rel­a­tively new fields such as the Inter­net of Things, wear­ables, and large-scale social graphs and col­lec­tive activ­ity domains such as instru­mented envi­ron­ments and the quan­ti­fied self.  These new domains of appli­ca­tion are not mature in ana­lyt­i­cal terms; ana­lysts are still work­ing to iden­tify the most effec­tive tech­niques for yield­ing insights from data within their bounds.

The sec­ond rel­e­vant per­spec­tive is our under­stand­ing of dis­cov­ery as an activ­ity that is dis­tinct and rec­og­niz­able in com­par­i­son to gen­er­al­ized analy­sis: from this, we can sum­ma­rize as sense­mak­ing intended to arrive at novel insights, through explo­ration and analy­sis of diverse and dynamic data in an iter­a­tive and evolv­ing fashion.

Look­ing deeper, one spe­cific char­ac­ter­is­tic of dis­cov­ery as an activ­ity is the absence of for­mally artic­u­lated state­ments of belief and expected out­comes at the begin­ning of most dis­cov­ery efforts.  Another is the iter­a­tive nature of dis­cov­ery efforts, which can change course in non-linear ways and even ‘back­track’ on the way to arriv­ing at insights: both the data and the tech­niques used to ana­lyze data change dur­ing dis­cov­ery efforts.  For­mally defined exper­i­ments are much more clearly deter­mined from the begin­ning, and their def­i­n­i­tion is less open to change dur­ing their course. A pro­gram of related exper­i­ments con­ducted over time may show iter­a­tive adap­ta­tion of goals, data and meth­ods, but the indi­vid­ual exper­i­ments them­selves are not mal­leable and dynamic in the fash­ion of dis­cov­ery.  Discovery’s empha­sis on novel insight as pre­ferred out­come is another impor­tant char­ac­ter­is­tic; by con­trast, for­mal exper­i­ments are repeat­able and ver­i­fi­able by def­i­n­i­tion, and the degree of repeata­bil­ity is a cri­te­ria of well-designed exper­i­ments.  Dis­cov­ery efforts often involve an intu­itive shift in per­spec­tive that is recount­able and retrace­able in ret­ro­spect, but can­not be anticipated.

Build­ing on these two foun­da­tions, we can define Empir­i­cal Dis­cov­ery as a hybrid, pur­pose­ful, applied, aug­mented, iter­a­tive and serendip­i­tous method for real­iz­ing novel insights for busi­ness, through analy­sis of large and diverse data sets.

Let’s look at these facets in more detail.

Empir­i­cal dis­cov­ery pri­mar­ily addresses the prac­ti­cal goals and audi­ences of busi­ness (or indus­try), rather than sci­en­tific, aca­d­e­mic, or the­o­ret­i­cal objec­tives.  This is tremen­dously impor­tant, since  the prac­ti­cal con­text impacts every aspect of Empir­i­cal Discovery.

Large and diverse data sets’ reflects the fact that Data Sci­ence prac­ti­tion­ers engage with Big Data as we cur­rently under­stand it; sit­u­a­tions in which the con­flu­ence of data types and vol­umes exceeds the capa­bil­i­ties of busi­ness ana­lyt­ics to prac­ti­cally real­ize insights in terms of tools, infra­struc­ture, prac­tices, etc.

Empir­i­cal dis­cov­ery uses a rapidly evolv­ing hybridized toolkit, blend­ing a wide range of gen­eral and advanced sta­tis­ti­cal tech­niques with sophis­ti­cated exploratory and ana­lyt­i­cal meth­ods from a wide vari­ety of sources that includes data min­ing, nat­ural lan­guage pro­cess­ing, machine learn­ing, neural net­works, bayesian analy­sis, and emerg­ing tech­niques such as topo­log­i­cal data analy­sis and deep learn­ing.

What’s most notable about this hybrid toolkit is that Empir­i­cal Dis­cov­ery does not orig­i­nate novel analy­sis tech­niques, it bor­rows tools from estab­lished dis­ci­plines such infor­ma­tion retrieval, arti­fi­cial intel­li­gence, com­puter sci­ence, and the social sci­ences.  Many of the more spe­cial­ized or appar­ently exotic tech­niques data sci­ence and empir­i­cal dis­cov­ery rely on, such as sup­port vec­tor machines, deep learn­ing, or mea­sur­ing mutual infor­ma­tion in data sets, have estab­lished his­to­ries of usage in aca­d­e­mic or other indus­try set­tings, and have reached rea­son­able lev­els of matu­rity.  Empir­i­cal discovery’s hybrid toolkit is  trans­posed from one domain of appli­ca­tion to another, rather than invented.

Empir­i­cal Dis­cov­ery is an applied method in the same way Data Sci­ence is an applied dis­ci­pline: it orig­i­nates in and is adapted to busi­ness con­texts, it focuses on arriv­ing at use­ful insights to inform busi­ness activ­i­ties, and it is not used to con­duct basic research.  At this early stage of devel­op­ment, Empir­i­cal Dis­cov­ery has no inde­pen­dent and artic­u­lated the­o­ret­i­cal basis and does not (yet) advance a dis­tinct body of knowl­edge based on the­ory or prac­tice. All viable dis­ci­plines have a body of knowl­edge, whether for­mal or infor­mal, and applied dis­ci­plines have only their cumu­la­tive body of knowl­edge to dis­tin­guish them, so I expect this to change.

Empir­i­cal dis­cov­ery is not only applied, but explic­itly pur­pose­ful in that it is always set in motion and directed by an agenda from a larger con­text, typ­i­cally the spe­cific busi­ness goals of the orga­ni­za­tion act­ing as a prime mover and fund­ing data sci­ence posi­tions and tools.  Data Sci­ence prac­ti­tion­ers effect Empir­i­cal Dis­cov­ery by mak­ing it hap­pen on a daily basis — but wher­ever there is empir­i­cal dis­cov­ery activ­ity, there is sure to be inten­tion­al­ity from a busi­ness view.  For exam­ple, even in orga­ni­za­tions with a for­mal hack time pol­icy, our research sug­gests there is lit­tle or no com­pletely undi­rected or self-directed empir­i­cal dis­cov­ery activ­ity, whether con­ducted by for­mally rec­og­nized Data Sci­ence prac­ti­tion­ers, busi­ness ana­lysts, or others.

One very impor­tant impli­ca­tion of the sit­u­a­tional pur­pose­ful­ness of Empir­i­cal Dis­cov­ery is that there is no direct imper­a­tive for gen­er­at­ing a body of cumu­la­tive knowl­edge through orig­i­nal research: the insights that result from Empir­i­cal Dis­cov­ery efforts are judged by their prac­ti­cal util­ity in an imme­di­ate con­text.  There is also no explicit sci­en­tific bur­den of proof or ver­i­fi­a­bil­ity asso­ci­ated with Empir­i­cal Dis­cov­ery within it’s pri­mary con­text of appli­ca­tion.  Many prac­ti­tion­ers encour­age some aspects of ver­i­fi­a­bil­ity, for exam­ple, by anno­tat­ing the var­i­ous sources of data used for their efforts and the trans­for­ma­tions involved in wran­gling data on the road to insights or data prod­ucts, but this is not a require­ment of the method.  Another impli­ca­tion is that empir­i­cal dis­cov­ery does not adhere to any explicit moral, eth­i­cal, or value-based mis­sions that tran­scend work­ing con­text.  While Data Sci­en­tists often inter­pret their role as trans­for­ma­tive, this is in ref­er­ence to busi­ness.  Data Sci­ence is not med­i­cine, for exam­ple, with a Hip­po­cratic oath.

Empir­i­cal Dis­cov­ery is an aug­mented method in that it depends on com­put­ing and machine resources to increase human ana­lyt­i­cal capa­bil­i­ties: It is sim­ply imprac­ti­cal for peo­ple to man­u­ally under­take many of the ana­lyt­i­cal tech­niques com­mon to Data Sci­ence.  An impor­tant point to remem­ber about aug­mented meth­ods is that they are not auto­mated; peo­ple remain nec­es­sary, and it is the com­bi­na­tion of human and machine that is effec­tive at yield­ing insights.  In the prob­lem domain of dis­cov­ery, the pat­terns of sense­mak­ing activ­ity lead­ing to insight are intu­itive, non-linear, and asso­cia­tive; activites with these char­ac­ter­is­tics are not fully automat­able with cur­rent tech­nol­ogy. And while many ana­lyt­i­cal tech­niques can be use­fully auto­mated within bound­aries, these tasks typ­i­cally make up just a por­tion of an com­plete dis­cov­ery effort.  For exam­ple, using latent class analy­sis to explore a machine-sampled sub­set of a larger data cor­pus is task-specific automa­tion com­ple­ment­ing human per­spec­tive at par­tic­u­lar points of the Empir­i­cal Dis­cov­ery work­flow.  This depen­dence on machine aug­mented ana­lyt­i­cal capa­bil­ity is recent within the his­tory of ana­lyt­i­cal meth­ods.  In most of the mod­ern era — roughly the later 17th, 18th, 19th and early 20th cen­turies — the data employed in dis­cov­ery efforts was man­age­able ‘by hand’, even when using the newest math­e­mat­i­cal and ana­lyt­i­cal meth­ods emerg­ing at the time.  This remained true until the effec­tive com­mer­cial­iza­tion of machine com­put­ing ended the need for human com­put­ers as a rec­og­nized role in the mid­dle of the 20th century.

The real­ity of most ana­lyt­i­cal efforts — even those with good ini­tial def­i­n­i­tion — is that insights often emerge in response to and in tan­dem with chang­ing and evolv­ing ques­tions which were not iden­ti­fied, or per­haps not even under­stood, at the out­set.  Dur­ing dis­cov­ery efforts, ana­lyt­i­cal goals and tech­niques, as well as the data under con­sid­er­a­tion, often shift in unpre­dictable ways, mak­ing the path to insight dynamic and non-linear.  Fur­ther, the sources of and inspi­ra­tions for insight are  dif­fi­cult or impos­si­ble to iden­tify both at the time and in ret­ro­spect. Empir­i­cal dis­cov­ery addresses the com­plex and opaque nature of dis­cov­ery with iter­a­tion and adap­ta­tion, which com­bine  to set the stage for serendip­ity.

With this ini­tial def­i­n­i­tion of Empir­i­cal Dis­cov­ery in hand, the nat­ural ques­tion is what this means for Data Sci­ence and busi­ness ana­lyt­ics?  Three thigns stand out for me.  First, I think one of the cen­tral roles played by Data Sci­ence is in pio­neer­ing the appli­ca­tion of exist­ing ana­lyt­i­cal meth­ods from spe­cial­ized domains to serve gen­eral busi­ness goals and per­spec­tives, seek­ing effec­tive ways to work with the new types (graph, sen­sor, social, etc.) and tremen­dous vol­umes (yotta, yotta, yotta…) of busi­ness data at hand in the Big Data moment and real­ize insights

Sec­ond, fol­low­ing from this, Empir­i­cal Dis­cov­ery is method­olog­i­cal a frame­work within and through which a great vari­ety of ana­lyt­i­cal tech­niques at dif­fer­ing lev­els of matu­rity and from other dis­ci­plines are vet­ted for busi­ness ana­lyt­i­cal util­ity in iter­a­tive fash­ion by Data Sci­ence practitioners.

And third, it seems this vet­ting func­tion is delib­er­ately part of the makeup of empir­i­cal dis­cov­ery, which I con­sider a very clever way to cre­ate a feed­back loop that enhances Data Sci­ence prac­tice by using Empir­i­cal Dis­cov­ery as a dis­cov­ery tool for refin­ing its own methods.

Comment » | Big Data

Big Data is a Condition (Or, "It's (Mostly) In Your Head")

March 10th, 2014 — 1:07pm

Unsur­pris­ingly, def­i­n­i­tions of Big Data run the gamut from the turgid to the flip, mak­ing room to include the trite, the breath­less, and the sim­ply un-inspiring in the big cir­cle around the camp­fire. Some of these def­i­n­i­tions are use­ful in part, but none of them cap­tures the essence of the mat­ter. Most are mis­takes in kind, try­ing to ground and cap­ture Big Data as a ‘thing’ of some sort that is mea­sur­able in objec­tive terms. Any­time you encounter a num­ber, this is the school of thought.

Some approach Big Data as a state of being, most often a sim­ple oper­a­tional state of insuf­fi­ciency of some kind; typ­i­cally resources like ana­lysts, com­pute power or stor­age for han­dling data effec­tively; occa­sion­ally some­thing less quan­tifi­able like clar­ity of pur­pose and cri­te­ria for man­age­ment. Any­time you encounter phras­ing that relies on the reader to inter­pret and define the par­tic­u­lars of the insuf­fi­ciency, this is the school of thought.

I see Big Data as a self-defined (per­haps diag­nosed is more accu­rate) con­di­tion, but one that is based on idio­syn­cratic inter­pre­ta­tion of cur­rent and pos­si­ble future sit­u­a­tions in which under­stand­ing of, plan­ning for, and activ­ity around data are central.

Here’s my work­ing def­i­n­i­tion: Big Data is the con­di­tion in which very high actual or expected dif­fi­culty in work­ing suc­cess­fully with data com­bines with very high antic­i­pated but unknown value and ben­e­fit, lead­ing to the a-priori assump­tion that cur­rently avail­able infor­ma­tion man­age­ment and ana­lyt­i­cal capa­bil­ties are broadly insuf­fi­cient, mak­ing new and pre­vi­ously unknown capa­bil­i­ties seem­ingly necessary.

Comment » | Big Data

Strata New York Video: Designing Big Data Interactions With the Language of Discovery

December 6th, 2013 — 12:41pm

I’m late to mak­ing it avail­able here, but O’Reilly media pub­lished the video record­ing of my pre­sen­ta­tion on The Lan­guage of Dis­cov­ery: A Toolkit For Design­ing Big Data Inter­ac­tions from last year’s (2012) Strata con­fer­ence in NY.

Look­ing back at this, I’m happy to say that while my think­ing on sev­eral of the key ideas has advanced quite a bit in the past 12 months (see our more recent mate­ri­als), the core ideas and con­cepts remain vital.

Those are, briefly:

  • Big Data is use­less unless peo­ple can engage with it effectively
  • Dis­cov­ery is a crit­i­cal and inad­e­quately acknowl­edged aspect of sense mak­ing that is core to real­iz­ing value from Big Data
  • Dis­cov­ery is lit­er­ally the most impor­tant human/machine inter­ac­tion in the emerg­ing Age of Insight
  • Pro­vid­ing dis­cov­ery capa­bil­ity requires under­stand­ing people’s needs and goals
  • The Lan­guage of Dis­cov­ery is an effec­tive tool for under­stand­ing dis­cov­ery needs and activ­i­ties, and design­ing solutions
  • There are known pat­terns and struc­ture in dis­cov­ery activ­i­ties that you can use to cre­ate dis­cov­ery solutions

I’ve posted it to vimeo for eas­ier view­ing — slides are here http://www.joelamantia.com/user-experience-ux/strata-new-york-slides-new-discovery-patterns for those who wish to fol­low along - enjoy!



Comment » | Language of Discovery

Understanding Data Science: Two Recent Studies

October 22nd, 2013 — 7:40am

If you need such a deeper under­stand­ing of data sci­ence than Drew Conway’s pop­u­lar venn dia­gram model, or Josh Wills’ tongue in cheek char­ac­ter­i­za­tion, “Data Sci­en­tist (n.): Per­son who is bet­ter at sta­tis­tics than any soft­ware engi­neer and bet­ter at soft­ware engi­neer­ing than any sta­tis­ti­cian.” two rel­a­tively recent stud­ies are worth reading.

Ana­lyz­ing the Ana­lyz­ers,’ an O’Reilly e-book by Har­lan Har­ris, Sean Patrick Mur­phy, and Marck Vais­man, sug­gests four dis­tinct types of data sci­en­tists — effec­tively per­sonas, in a design sense — based on analy­sis of self-identified skills among prac­ti­tion­ers.  The sce­nario for­mat dra­ma­tizes the dif­fer­ent per­sonas, mak­ing what could be a dry sta­tis­ti­cal read­out of sur­vey data more engag­ing.  The survey-only nature of the data,  the restric­tion of scope to just skills, and the sug­gested mod­els of skill-profiles makes this feel like the sort of exer­cise that data sci­en­tists under­take as an every day task; col­lect­ing data, ana­lyz­ing it using a mix of sta­tis­ti­cal tech­niques, and shar­ing the model that emerges from the data min­ing exer­cise.  That’s not an indict­ment, sim­ply an obser­va­tion about the con­sis­tent feel of the effort as a prod­uct of data sci­en­tists, about data science.

And the paper ‘Enter­prise Data Analy­sis and Visu­al­iza­tion: An Inter­view Study’ by researchers Sean Kan­del, Andreas Paepcke, Joseph Heller­stein, and Jef­fery Heer con­sid­ers data sci­ence within the larger con­text of indus­trial data analy­sis, exam­in­ing ana­lyt­i­cal work­flows, skills, and the chal­lenges com­mon to enter­prise analy­sis efforts, and iden­ti­fy­ing three arche­types of data sci­en­tist.  As an interview-based study, the data the researchers col­lected is richer, and there’s cor­re­spond­ingly greater depth in the syn­the­sis.  The scope of the study included a broader set of roles than data sci­en­tist (enter­prise ana­lysts) and involved ques­tions of work­flow and orga­ni­za­tional con­text for ana­lyt­i­cal efforts in gen­eral.  I’d sug­gest this is use­ful as a primer on ana­lyt­i­cal work and work­ers in enter­prise set­tings for those who need a base­line under­stand­ing; it also offers some gen­uinely inter­est­ing nuggets for those already famil­iar with dis­cov­ery work.

We’ve under­taken a con­sid­er­able amount of research into dis­cov­ery, ana­lyt­i­cal work/ers, and data sci­ence over the past three years — part of our pro­gram­matic approach to lay­ing a foun­da­tion for prod­uct strat­egy and high­light­ing inno­va­tion oppor­tu­ni­ties — and both stud­ies com­ple­ment and con­firm much of the direct research into data sci­ence that we con­ducted. There were a few impor­tant dif­fer­ences in our find­ings, which I’ll share and dis­cuss in upcom­ing posts.

Comment » | Language of Discovery, User Research

Defining Discovery: Core Concepts

October 18th, 2013 — 12:33pm

Dis­cov­ery tools have had a ref­er­ence­able work­ing def­i­n­i­tion since at least 2001, when Ben Shnei­der­man pub­lished ‘Invent­ing Dis­cov­ery Tools: Com­bin­ing Infor­ma­tion Visu­al­iza­tion with Data Min­ing’.  Dr. Shnei­der­man sug­gested the com­bi­na­tion of the two dis­tinct fields of data min­ing and infor­ma­tion visu­al­iza­tion could man­i­fest as new cat­e­gory of tools for dis­cov­ery, an under­stand­ing that remains essen­tially unal­tered over ten years later.  An indus­try ana­lyst report titled Visual Dis­cov­ery Tools: Mar­ket Seg­men­ta­tion and Prod­uct Posi­tion­ing from March of this year, for exam­ple, reads, “Visual dis­cov­ery tools are designed for visual data explo­ration, analy­sis and light­weight data mining.”

Tools should fol­low from the activ­i­ties peo­ple under­take (a foun­da­tional tenet of activ­ity cen­tered design), how­ever, and Dr. Shnei­der­man does not in fact describe or define dis­cov­ery activ­ity or capa­bil­ity. As I read it, dis­cov­ery is assumed to be the implied sum of the sep­a­rate fields of visu­al­iza­tion and data min­ing as they were then under­stood.  As a work­ing def­i­n­i­tion that cat­alyzes a field of prod­uct pro­to­typ­ing, it’s ade­quate in the short term.  In the long term, it makes the bound­aries of dis­cov­ery both derived and tem­po­rary, and leaves a sub­stan­tial gap in the land­scape of core con­cepts around dis­cov­ery, mak­ing con­sen­sus on the nature of most aspects of dis­cov­ery dif­fi­cult or impos­si­ble to reach.  I think this def­i­n­i­tional gap is a major rea­son that dis­cov­ery is still an ambigu­ous prod­uct landscape.

To help close that gap, I’m sug­gest­ing a few def­i­n­i­tions of four core aspects of dis­cov­ery.  These come out of our sus­tained research into dis­cov­ery needs and prac­tices, and have the goal of clar­i­fy­ing the rela­tion­ship between dis­cvo­ery and other ana­lyt­i­cal cat­e­gories.  They are sug­gested, but should be inter­nally coher­ent and consistent.

Dis­cov­ery activ­ity is: “Pur­pose­ful sense mak­ing activ­ity that intends to arrive at new insights and under­stand­ing through explo­ration and analy­sis (and for these we have spe­cific defin­tions as well) of all types and sources of data.”

Dis­cov­ery capa­bil­ity is: “The abil­ity of peo­ple and orga­ni­za­tions to pur­pose­fully real­ize valu­able insights that address the full spec­trum of busi­ness ques­tions and prob­lems by engag­ing effec­tively with all types and sources of data.”

Dis­cov­ery tools: “Enhance indi­vid­ual and orga­ni­za­tional abil­ity to real­ize novel insights by aug­ment­ing and accel­er­at­ing human sense mak­ing to allow engage­ment with all types of data at all use­ful scales.”

Dis­cov­ery envi­ron­ments: “Enable orga­ni­za­tions to under­take effec­tive dis­cov­ery efforts for all busi­ness pur­poses and per­spec­tives, in an empir­i­cal and coöper­a­tive fashion.”

Note: applic­a­bil­ity to a world of Big data is assumed — thus the refs to all scales / types / sources — rather than stated explic­itly.  I like that Big Data doesn’t have to be writ­ten into this core set of def­i­n­i­tions, b/c I think it’s a tran­si­tional label — the new ver­sion of Web 2.0 — and goes away over time.

Ref­er­ences and Resources:

Comment » | Language of Discovery

Discovery and the Age of Insight

August 21st, 2013 — 1:06pm

Sev­eral weeks ago, I was invited to speak to an audi­ence of IT and busi­ness lead­ers at Wal­mart about the Lan­guage of Dis­cov­ery.   Every pre­sen­ta­tion is a feed­back oppor­tu­nity as much as a chance to broad­cast our lat­est think­ing (a tenet of what I call lean strat­egy prac­tice — musi­cians call it try­ing out new mate­r­ial), so I make a point to share evolv­ing ideas and syn­the­size what we’ve learned since the last instance of pub­lic dialog.

For the audi­ence at Wal­mart, as part of the broader fram­ing for the Age of Insight, I took the oppor­tu­nity to share find­ings from some of the recent research we’ve done on Data Sci­ence (that’s right, we’re study­ing data sci­ence).  We’ve engaged con­sis­tently with data sci­ence prac­ti­tion­ers for sev­eral years now (some of the field’s lead­ers are alumni of Endeca), as part of our ongo­ing effort to under­stand the chang­ing nature of ana­lyt­i­cal and sense mak­ing activ­i­ties, the peo­ple under­tak­ing them, and the con­texts in which they take place.  We’ve seen the dis­ci­pline emerge from an eso­teric spe­cialty into full main­stream vis­i­bil­ity for the busi­ness com­mu­nity.  Inter­pret­ing what we’ve learned about data sci­ence through a struc­tural and his­toric per­spec­tive lead me to draw a broad par­al­lel between data sci­ence now and nat­ural phi­los­o­phy at its early stages of evolution.

We also shared some excit­ing new mod­els for enter­prise infor­ma­tion engage­ment; craft­ing sce­nar­ios using the lan­guage of dis­cov­ery to describe infor­ma­tion needs and activ­ity at the level of dis­cov­ery archi­tec­ture, IT port­fo­lio plan­ning,  and knowl­edge man­age­ment (which cor­re­spond to UX, tech­nol­ogy, and busi­ness per­spec­tives as applied to larger scales and via busi­ness dia­log) — demon­strat­ing the ver­sa­til­ity of the lan­guage as a source of link­age across sep­a­rate disciplines.

But the pri­mary mes­sage I wanted to share is that dis­cov­ery is the most impor­tant orga­ni­za­tional capa­bil­ity for the era.  More on this in fol­low up post­ings that focus on smaller chunks of the think­ing encap­su­lated in the full deck of slides.

Comment » | Language of Discovery

Back to top