Data Science Highlights: An Investigation of the Discipline

March 28th, 2014 — 1:26pm

I’ve posted a sub­stan­tial read­out sum­ma­riz­ing some of the more salient find­ings from a long-running pro­gram­matic research pro­gram into data sci­ence. This deck shares syn­the­sized find­ings around many of the facets of data sci­ence as a dis­ci­pline, includ­ing prac­tices, work­flow, tools, org mod­els, skills, etc. This read­out dis­tills a very wide range of inputs, includ­ing; direct inter­views, field-based ethnog­ra­phy, com­mu­nity par­tic­i­pa­tion (real-world and on-line), sec­ondary research from indus­try and aca­d­e­mic sources, analy­sis of hir­ing and invest­ment activ­ity in data sci­ence over sev­eral years, descrip­tive and def­i­n­i­tional arti­facts authored by prac­ti­tion­ers / ana­lysts / edu­ca­tors, and other exter­nal actors, media cov­er­age of data sci­ence, his­tor­i­cal antecedents, the struc­ture and evo­lu­tion of pro­fes­sional dis­ci­plines, and even more.

I con­sider it a sort of business-anthropology-style inves­ti­ga­tion of data sci­ence, con­ducted from the view­point of prod­uct making’s pri­mary aspects; strat­egy, man­age­ment, design, and delivery.

I learned a great deal dur­ing the course of this effort, and expect to con­tinue to learn, as data sci­ence will con­tinue to evolve rapidly for the next sev­eral years.

Data sci­ence prac­ti­tion­ers look­ing at this mate­r­ial are invited to pro­vide feed­back about where these mate­ri­als are accu­rate or inac­cu­rate, and most espe­cially about what is miss­ing, and what is com­ing next for this very excit­ing field.



Data Sci­ence High­lights from Joe Laman­tia



Comment » | Big Data, User Research

Data Science and Empirical Discovery: A New Discipline Pioneering a New Analytical Method

March 26th, 2014 — 11:00am

One of the essen­tial pat­terns of sci­ence and indus­try in the mod­ern era is that new meth­ods for under­stand­ing — what I’ll call sense­mak­ing from now on — often emerge hand in hand with new pro­fes­sional and sci­en­tific dis­ci­plines.  This link­age between new dis­ci­plines and new meth­ods fol­lows from the  decep­tively sim­ple imper­a­tive to real­ize new types of insight, which often means analy­sis of new kinds of data, using new tech­niques, applied from newly defined per­spec­tives. New view­points and new ways of under­stand­ing are lit­er­ally bound together in a sort of symbiosis.

One famil­iar exam­ple of this dynamic is the rapid devel­op­ment of sta­tis­tics dur­ing the 18th and 19th cen­turies, in close par­al­lel with the rise of new social sci­ence dis­ci­plines includ­ing eco­nom­ics (orig­i­nally polit­i­cal econ­omy) and soci­ol­ogy, and nat­ural sci­ences such as astron­omy and physics.  On a very broad scale, we can see the pat­tern in the tan­dem evo­lu­tion of the sci­en­tific method for sense­mak­ing, and the cod­i­fi­ca­tion of mod­ern sci­en­tific dis­ci­plines based on pre­cur­sor fields such as nat­ural his­tory and nat­ural phi­los­o­phy dur­ing the sci­en­tific rev­o­lu­tion.

Today, we can see this pat­tern clearly in the simul­ta­ne­ous emer­gence of Data Sci­ence as a new and dis­tinct dis­ci­pline accom­pa­nied by Empir­i­cal Dis­cov­ery, the new sense­mak­ing and analy­sis method Data Sci­ence is pio­neer­ing.  Given its dra­matic rise to promi­nence recently, declar­ing Data Sci­ence a new pro­fes­sional dis­ci­pline should inspire lit­tle con­tro­versy. Declar­ing Empir­i­cal Dis­cov­ery a new method may seem bolder, but when we with the essen­tial pat­tern of new dis­ci­plines appear­ing in tan­dem with new sense­mak­ing meth­ods in mind, it is more con­tro­ver­sial to sug­gest Data Sci­ence is a new dis­ci­pline that lacks a cor­re­spond­ing new method for sense­mak­ing.  (I would argue it is the method that makes the dis­ci­pline, not the other way around, but that is a topic for fuller treat­ment elsewhere)

What is empir­i­cal dis­cov­ery?  While empir­i­cal dis­cov­ery is a new sense­mak­ing method, we can build on two exist­ing foun­da­tions to under­stand its dis­tin­guish­ing char­ac­ter­is­tics, and help craft an ini­tial def­i­n­i­tion.  The first of these is an under­stand­ing of the empir­i­cal method. Con­sider the fol­low­ing description:

The empir­i­cal method is not sharply defined and is often con­trasted with the pre­ci­sion of the exper­i­men­tal method, where data are derived from the sys­tem­atic manip­u­la­tion of vari­ables in an exper­i­ment.  …The empir­i­cal method is gen­er­ally char­ac­ter­ized by the col­lec­tion of a large amount of data before much spec­u­la­tion as to their sig­nif­i­cance, or with­out much idea of what to expect, and is to be con­trasted with more the­o­ret­i­cal meth­ods in which the col­lec­tion of empir­i­cal data is guided largely by pre­lim­i­nary the­o­ret­i­cal explo­ration of what to expect. The empir­i­cal method is nec­es­sary in enter­ing hith­erto com­pletely unex­plored fields, and becomes less purely empir­i­cal as the acquired mas­tery of the field increases. Suc­cess­ful use of an exclu­sively empir­i­cal method demands a higher degree of intu­itive abil­ity in the practitioner.”

Data Sci­ence as prac­ticed is largely con­sis­tent with this pic­ture.  Empir­i­cal pre­rog­a­tives and under­stand­ings shape the pro­ce­dural plan­ning of Data Sci­ence efforts, rather than the­o­ret­i­cal con­structs.  Semi-formal approaches pre­dom­i­nate over explic­itly cod­i­fied meth­ods, sig­nal­ing the impor­tance of intu­ition.  Data sci­en­tists often work with data that is on-hand already from busi­ness activ­ity, or data that is newly gen­er­ated through nor­mal busi­ness oper­a­tions, rather than seek­ing to acquire wholly new data that is con­sis­tent with the design para­me­ters and goals of for­mal exper­i­men­tal efforts.  Much of the sense­mak­ing activ­ity around data is explic­itly exploratory (what I call the ‘pan­ning for gold’ stage of evo­lu­tion — more on this in sub­se­quent post­ings), rather than sys­tem­atic in the manip­u­la­tion of known vari­ables.  These exploratory tech­niques are used to address rel­a­tively new fields such as the Inter­net of Things, wear­ables, and large-scale social graphs and col­lec­tive activ­ity domains such as instru­mented envi­ron­ments and the quan­ti­fied self.  These new domains of appli­ca­tion are not mature in ana­lyt­i­cal terms; ana­lysts are still work­ing to iden­tify the most effec­tive tech­niques for yield­ing insights from data within their bounds.

The sec­ond rel­e­vant per­spec­tive is our under­stand­ing of dis­cov­ery as an activ­ity that is dis­tinct and rec­og­niz­able in com­par­i­son to gen­er­al­ized analy­sis: from this, we can sum­ma­rize as sense­mak­ing intended to arrive at novel insights, through explo­ration and analy­sis of diverse and dynamic data in an iter­a­tive and evolv­ing fashion.

Look­ing deeper, one spe­cific char­ac­ter­is­tic of dis­cov­ery as an activ­ity is the absence of for­mally artic­u­lated state­ments of belief and expected out­comes at the begin­ning of most dis­cov­ery efforts.  Another is the iter­a­tive nature of dis­cov­ery efforts, which can change course in non-linear ways and even ‘back­track’ on the way to arriv­ing at insights: both the data and the tech­niques used to ana­lyze data change dur­ing dis­cov­ery efforts.  For­mally defined exper­i­ments are much more clearly deter­mined from the begin­ning, and their def­i­n­i­tion is less open to change dur­ing their course. A pro­gram of related exper­i­ments con­ducted over time may show iter­a­tive adap­ta­tion of goals, data and meth­ods, but the indi­vid­ual exper­i­ments them­selves are not mal­leable and dynamic in the fash­ion of dis­cov­ery.  Discovery’s empha­sis on novel insight as pre­ferred out­come is another impor­tant char­ac­ter­is­tic; by con­trast, for­mal exper­i­ments are repeat­able and ver­i­fi­able by def­i­n­i­tion, and the degree of repeata­bil­ity is a cri­te­ria of well-designed exper­i­ments.  Dis­cov­ery efforts often involve an intu­itive shift in per­spec­tive that is recount­able and retrace­able in ret­ro­spect, but can­not be anticipated.

Build­ing on these two foun­da­tions, we can define Empir­i­cal Dis­cov­ery as a hybrid, pur­pose­ful, applied, aug­mented, iter­a­tive and serendip­i­tous method for real­iz­ing novel insights for busi­ness, through analy­sis of large and diverse data sets.

Let’s look at these facets in more detail.

Empir­i­cal dis­cov­ery pri­mar­ily addresses the prac­ti­cal goals and audi­ences of busi­ness (or indus­try), rather than sci­en­tific, aca­d­e­mic, or the­o­ret­i­cal objec­tives.  This is tremen­dously impor­tant, since  the prac­ti­cal con­text impacts every aspect of Empir­i­cal Discovery.

Large and diverse data sets’ reflects the fact that Data Sci­ence prac­ti­tion­ers engage with Big Data as we cur­rently under­stand it; sit­u­a­tions in which the con­flu­ence of data types and vol­umes exceeds the capa­bil­i­ties of busi­ness ana­lyt­ics to prac­ti­cally real­ize insights in terms of tools, infra­struc­ture, prac­tices, etc.

Empir­i­cal dis­cov­ery uses a rapidly evolv­ing hybridized toolkit, blend­ing a wide range of gen­eral and advanced sta­tis­ti­cal tech­niques with sophis­ti­cated exploratory and ana­lyt­i­cal meth­ods from a wide vari­ety of sources that includes data min­ing, nat­ural lan­guage pro­cess­ing, machine learn­ing, neural net­works, bayesian analy­sis, and emerg­ing tech­niques such as topo­log­i­cal data analy­sis and deep learn­ing.

What’s most notable about this hybrid toolkit is that Empir­i­cal Dis­cov­ery does not orig­i­nate novel analy­sis tech­niques, it bor­rows tools from estab­lished dis­ci­plines such infor­ma­tion retrieval, arti­fi­cial intel­li­gence, com­puter sci­ence, and the social sci­ences.  Many of the more spe­cial­ized or appar­ently exotic tech­niques data sci­ence and empir­i­cal dis­cov­ery rely on, such as sup­port vec­tor machines, deep learn­ing, or mea­sur­ing mutual infor­ma­tion in data sets, have estab­lished his­to­ries of usage in aca­d­e­mic or other indus­try set­tings, and have reached rea­son­able lev­els of matu­rity.  Empir­i­cal discovery’s hybrid toolkit is  trans­posed from one domain of appli­ca­tion to another, rather than invented.

Empir­i­cal Dis­cov­ery is an applied method in the same way Data Sci­ence is an applied dis­ci­pline: it orig­i­nates in and is adapted to busi­ness con­texts, it focuses on arriv­ing at use­ful insights to inform busi­ness activ­i­ties, and it is not used to con­duct basic research.  At this early stage of devel­op­ment, Empir­i­cal Dis­cov­ery has no inde­pen­dent and artic­u­lated the­o­ret­i­cal basis and does not (yet) advance a dis­tinct body of knowl­edge based on the­ory or prac­tice. All viable dis­ci­plines have a body of knowl­edge, whether for­mal or infor­mal, and applied dis­ci­plines have only their cumu­la­tive body of knowl­edge to dis­tin­guish them, so I expect this to change.

Empir­i­cal dis­cov­ery is not only applied, but explic­itly pur­pose­ful in that it is always set in motion and directed by an agenda from a larger con­text, typ­i­cally the spe­cific busi­ness goals of the orga­ni­za­tion act­ing as a prime mover and fund­ing data sci­ence posi­tions and tools.  Data Sci­ence prac­ti­tion­ers effect Empir­i­cal Dis­cov­ery by mak­ing it hap­pen on a daily basis — but wher­ever there is empir­i­cal dis­cov­ery activ­ity, there is sure to be inten­tion­al­ity from a busi­ness view.  For exam­ple, even in orga­ni­za­tions with a for­mal hack time pol­icy, our research sug­gests there is lit­tle or no com­pletely undi­rected or self-directed empir­i­cal dis­cov­ery activ­ity, whether con­ducted by for­mally rec­og­nized Data Sci­ence prac­ti­tion­ers, busi­ness ana­lysts, or others.

One very impor­tant impli­ca­tion of the sit­u­a­tional pur­pose­ful­ness of Empir­i­cal Dis­cov­ery is that there is no direct imper­a­tive for gen­er­at­ing a body of cumu­la­tive knowl­edge through orig­i­nal research: the insights that result from Empir­i­cal Dis­cov­ery efforts are judged by their prac­ti­cal util­ity in an imme­di­ate con­text.  There is also no explicit sci­en­tific bur­den of proof or ver­i­fi­a­bil­ity asso­ci­ated with Empir­i­cal Dis­cov­ery within it’s pri­mary con­text of appli­ca­tion.  Many prac­ti­tion­ers encour­age some aspects of ver­i­fi­a­bil­ity, for exam­ple, by anno­tat­ing the var­i­ous sources of data used for their efforts and the trans­for­ma­tions involved in wran­gling data on the road to insights or data prod­ucts, but this is not a require­ment of the method.  Another impli­ca­tion is that empir­i­cal dis­cov­ery does not adhere to any explicit moral, eth­i­cal, or value-based mis­sions that tran­scend work­ing con­text.  While Data Sci­en­tists often inter­pret their role as trans­for­ma­tive, this is in ref­er­ence to busi­ness.  Data Sci­ence is not med­i­cine, for exam­ple, with a Hip­po­cratic oath.

Empir­i­cal Dis­cov­ery is an aug­mented method in that it depends on com­put­ing and machine resources to increase human ana­lyt­i­cal capa­bil­i­ties: It is sim­ply imprac­ti­cal for peo­ple to man­u­ally under­take many of the ana­lyt­i­cal tech­niques com­mon to Data Sci­ence.  An impor­tant point to remem­ber about aug­mented meth­ods is that they are not auto­mated; peo­ple remain nec­es­sary, and it is the com­bi­na­tion of human and machine that is effec­tive at yield­ing insights.  In the prob­lem domain of dis­cov­ery, the pat­terns of sense­mak­ing activ­ity lead­ing to insight are intu­itive, non-linear, and asso­cia­tive; activites with these char­ac­ter­is­tics are not fully automat­able with cur­rent tech­nol­ogy. And while many ana­lyt­i­cal tech­niques can be use­fully auto­mated within bound­aries, these tasks typ­i­cally make up just a por­tion of an com­plete dis­cov­ery effort.  For exam­ple, using latent class analy­sis to explore a machine-sampled sub­set of a larger data cor­pus is task-specific automa­tion com­ple­ment­ing human per­spec­tive at par­tic­u­lar points of the Empir­i­cal Dis­cov­ery work­flow.  This depen­dence on machine aug­mented ana­lyt­i­cal capa­bil­ity is recent within the his­tory of ana­lyt­i­cal meth­ods.  In most of the mod­ern era — roughly the later 17th, 18th, 19th and early 20th cen­turies — the data employed in dis­cov­ery efforts was man­age­able ‘by hand’, even when using the newest math­e­mat­i­cal and ana­lyt­i­cal meth­ods emerg­ing at the time.  This remained true until the effec­tive com­mer­cial­iza­tion of machine com­put­ing ended the need for human com­put­ers as a rec­og­nized role in the mid­dle of the 20th century.

The real­ity of most ana­lyt­i­cal efforts — even those with good ini­tial def­i­n­i­tion — is that insights often emerge in response to and in tan­dem with chang­ing and evolv­ing ques­tions which were not iden­ti­fied, or per­haps not even under­stood, at the out­set.  Dur­ing dis­cov­ery efforts, ana­lyt­i­cal goals and tech­niques, as well as the data under con­sid­er­a­tion, often shift in unpre­dictable ways, mak­ing the path to insight dynamic and non-linear.  Fur­ther, the sources of and inspi­ra­tions for insight are  dif­fi­cult or impos­si­ble to iden­tify both at the time and in ret­ro­spect. Empir­i­cal dis­cov­ery addresses the com­plex and opaque nature of dis­cov­ery with iter­a­tion and adap­ta­tion, which com­bine  to set the stage for serendip­ity.

With this ini­tial def­i­n­i­tion of Empir­i­cal Dis­cov­ery in hand, the nat­ural ques­tion is what this means for Data Sci­ence and busi­ness ana­lyt­ics?  Three thigns stand out for me.  First, I think one of the cen­tral roles played by Data Sci­ence is in pio­neer­ing the appli­ca­tion of exist­ing ana­lyt­i­cal meth­ods from spe­cial­ized domains to serve gen­eral busi­ness goals and per­spec­tives, seek­ing effec­tive ways to work with the new types (graph, sen­sor, social, etc.) and tremen­dous vol­umes (yotta, yotta, yotta…) of busi­ness data at hand in the Big Data moment and real­ize insights

Sec­ond, fol­low­ing from this, Empir­i­cal Dis­cov­ery is method­olog­i­cal a frame­work within and through which a great vari­ety of ana­lyt­i­cal tech­niques at dif­fer­ing lev­els of matu­rity and from other dis­ci­plines are vet­ted for busi­ness ana­lyt­i­cal util­ity in iter­a­tive fash­ion by Data Sci­ence practitioners.

And third, it seems this vet­ting func­tion is delib­er­ately part of the makeup of empir­i­cal dis­cov­ery, which I con­sider a very clever way to cre­ate a feed­back loop that enhances Data Sci­ence prac­tice by using Empir­i­cal Dis­cov­ery as a dis­cov­ery tool for refin­ing its own methods.

Comment » | Big Data

Big Data is a Condition (Or, "It's (Mostly) In Your Head")

March 10th, 2014 — 1:07pm

Unsur­pris­ingly, def­i­n­i­tions of Big Data run the gamut from the turgid to the flip, mak­ing room to include the trite, the breath­less, and the sim­ply un-inspiring in the big cir­cle around the camp­fire. Some of these def­i­n­i­tions are use­ful in part, but none of them cap­tures the essence of the mat­ter. Most are mis­takes in kind, try­ing to ground and cap­ture Big Data as a ‘thing’ of some sort that is mea­sur­able in objec­tive terms. Any­time you encounter a num­ber, this is the school of thought.

Some approach Big Data as a state of being, most often a sim­ple oper­a­tional state of insuf­fi­ciency of some kind; typ­i­cally resources like ana­lysts, com­pute power or stor­age for han­dling data effec­tively; occa­sion­ally some­thing less quan­tifi­able like clar­ity of pur­pose and cri­te­ria for man­age­ment. Any­time you encounter phras­ing that relies on the reader to inter­pret and define the par­tic­u­lars of the insuf­fi­ciency, this is the school of thought.

I see Big Data as a self-defined (per­haps diag­nosed is more accu­rate) con­di­tion, but one that is based on idio­syn­cratic inter­pre­ta­tion of cur­rent and pos­si­ble future sit­u­a­tions in which under­stand­ing of, plan­ning for, and activ­ity around data are central.

Here’s my work­ing def­i­n­i­tion: Big Data is the con­di­tion in which very high actual or expected dif­fi­culty in work­ing suc­cess­fully with data com­bines with very high antic­i­pated but unknown value and ben­e­fit, lead­ing to the a-priori assump­tion that cur­rently avail­able infor­ma­tion man­age­ment and ana­lyt­i­cal capa­bil­ties are broadly insuf­fi­cient, mak­ing new and pre­vi­ously unknown capa­bil­i­ties seem­ingly necessary.

Comment » | Big Data

Strata New York Video: Designing Big Data Interactions With the Language of Discovery

December 6th, 2013 — 12:41pm

I’m late to mak­ing it avail­able here, but O’Reilly media pub­lished the video record­ing of my pre­sen­ta­tion on The Lan­guage of Dis­cov­ery: A Toolkit For Design­ing Big Data Inter­ac­tions from last year’s (2012) Strata con­fer­ence in NY.

Look­ing back at this, I’m happy to say that while my think­ing on sev­eral of the key ideas has advanced quite a bit in the past 12 months (see our more recent mate­ri­als), the core ideas and con­cepts remain vital.

Those are, briefly:

  • Big Data is use­less unless peo­ple can engage with it effectively
  • Dis­cov­ery is a crit­i­cal and inad­e­quately acknowl­edged aspect of sense mak­ing that is core to real­iz­ing value from Big Data
  • Dis­cov­ery is lit­er­ally the most impor­tant human/machine inter­ac­tion in the emerg­ing Age of Insight
  • Pro­vid­ing dis­cov­ery capa­bil­ity requires under­stand­ing people’s needs and goals
  • The Lan­guage of Dis­cov­ery is an effec­tive tool for under­stand­ing dis­cov­ery needs and activ­i­ties, and design­ing solutions
  • There are known pat­terns and struc­ture in dis­cov­ery activ­i­ties that you can use to cre­ate dis­cov­ery solutions

I’ve posted it to vimeo for eas­ier view­ing — slides are here for those who wish to fol­low along - enjoy!



Comment » | Language of Discovery

Understanding Data Science: Two Recent Studies

October 22nd, 2013 — 7:40am

If you need such a deeper under­stand­ing of data sci­ence than Drew Conway’s pop­u­lar venn dia­gram model, or Josh Wills’ tongue in cheek char­ac­ter­i­za­tion, “Data Sci­en­tist (n.): Per­son who is bet­ter at sta­tis­tics than any soft­ware engi­neer and bet­ter at soft­ware engi­neer­ing than any sta­tis­ti­cian.” two rel­a­tively recent stud­ies are worth reading.

Ana­lyz­ing the Ana­lyz­ers,’ an O’Reilly e-book by Har­lan Har­ris, Sean Patrick Mur­phy, and Marck Vais­man, sug­gests four dis­tinct types of data sci­en­tists — effec­tively per­sonas, in a design sense — based on analy­sis of self-identified skills among prac­ti­tion­ers.  The sce­nario for­mat dra­ma­tizes the dif­fer­ent per­sonas, mak­ing what could be a dry sta­tis­ti­cal read­out of sur­vey data more engag­ing.  The survey-only nature of the data,  the restric­tion of scope to just skills, and the sug­gested mod­els of skill-profiles makes this feel like the sort of exer­cise that data sci­en­tists under­take as an every day task; col­lect­ing data, ana­lyz­ing it using a mix of sta­tis­ti­cal tech­niques, and shar­ing the model that emerges from the data min­ing exer­cise.  That’s not an indict­ment, sim­ply an obser­va­tion about the con­sis­tent feel of the effort as a prod­uct of data sci­en­tists, about data science.

And the paper ‘Enter­prise Data Analy­sis and Visu­al­iza­tion: An Inter­view Study’ by researchers Sean Kan­del, Andreas Paepcke, Joseph Heller­stein, and Jef­fery Heer con­sid­ers data sci­ence within the larger con­text of indus­trial data analy­sis, exam­in­ing ana­lyt­i­cal work­flows, skills, and the chal­lenges com­mon to enter­prise analy­sis efforts, and iden­ti­fy­ing three arche­types of data sci­en­tist.  As an interview-based study, the data the researchers col­lected is richer, and there’s cor­re­spond­ingly greater depth in the syn­the­sis.  The scope of the study included a broader set of roles than data sci­en­tist (enter­prise ana­lysts) and involved ques­tions of work­flow and orga­ni­za­tional con­text for ana­lyt­i­cal efforts in gen­eral.  I’d sug­gest this is use­ful as a primer on ana­lyt­i­cal work and work­ers in enter­prise set­tings for those who need a base­line under­stand­ing; it also offers some gen­uinely inter­est­ing nuggets for those already famil­iar with dis­cov­ery work.

We’ve under­taken a con­sid­er­able amount of research into dis­cov­ery, ana­lyt­i­cal work/ers, and data sci­ence over the past three years — part of our pro­gram­matic approach to lay­ing a foun­da­tion for prod­uct strat­egy and high­light­ing inno­va­tion oppor­tu­ni­ties — and both stud­ies com­ple­ment and con­firm much of the direct research into data sci­ence that we con­ducted. There were a few impor­tant dif­fer­ences in our find­ings, which I’ll share and dis­cuss in upcom­ing posts.

Comment » | Language of Discovery, User Research

Defining Discovery: Core Concepts

October 18th, 2013 — 12:33pm

Dis­cov­ery tools have had a ref­er­ence­able work­ing def­i­n­i­tion since at least 2001, when Ben Shnei­der­man pub­lished ‘Invent­ing Dis­cov­ery Tools: Com­bin­ing Infor­ma­tion Visu­al­iza­tion with Data Min­ing’.  Dr. Shnei­der­man sug­gested the com­bi­na­tion of the two dis­tinct fields of data min­ing and infor­ma­tion visu­al­iza­tion could man­i­fest as new cat­e­gory of tools for dis­cov­ery, an under­stand­ing that remains essen­tially unal­tered over ten years later.  An indus­try ana­lyst report titled Visual Dis­cov­ery Tools: Mar­ket Seg­men­ta­tion and Prod­uct Posi­tion­ing from March of this year, for exam­ple, reads, “Visual dis­cov­ery tools are designed for visual data explo­ration, analy­sis and light­weight data mining.”

Tools should fol­low from the activ­i­ties peo­ple under­take (a foun­da­tional tenet of activ­ity cen­tered design), how­ever, and Dr. Shnei­der­man does not in fact describe or define dis­cov­ery activ­ity or capa­bil­ity. As I read it, dis­cov­ery is assumed to be the implied sum of the sep­a­rate fields of visu­al­iza­tion and data min­ing as they were then under­stood.  As a work­ing def­i­n­i­tion that cat­alyzes a field of prod­uct pro­to­typ­ing, it’s ade­quate in the short term.  In the long term, it makes the bound­aries of dis­cov­ery both derived and tem­po­rary, and leaves a sub­stan­tial gap in the land­scape of core con­cepts around dis­cov­ery, mak­ing con­sen­sus on the nature of most aspects of dis­cov­ery dif­fi­cult or impos­si­ble to reach.  I think this def­i­n­i­tional gap is a major rea­son that dis­cov­ery is still an ambigu­ous prod­uct landscape.

To help close that gap, I’m sug­gest­ing a few def­i­n­i­tions of four core aspects of dis­cov­ery.  These come out of our sus­tained research into dis­cov­ery needs and prac­tices, and have the goal of clar­i­fy­ing the rela­tion­ship between dis­cvo­ery and other ana­lyt­i­cal cat­e­gories.  They are sug­gested, but should be inter­nally coher­ent and consistent.

Dis­cov­ery activ­ity is: “Pur­pose­ful sense mak­ing activ­ity that intends to arrive at new insights and under­stand­ing through explo­ration and analy­sis (and for these we have spe­cific defin­tions as well) of all types and sources of data.”

Dis­cov­ery capa­bil­ity is: “The abil­ity of peo­ple and orga­ni­za­tions to pur­pose­fully real­ize valu­able insights that address the full spec­trum of busi­ness ques­tions and prob­lems by engag­ing effec­tively with all types and sources of data.”

Dis­cov­ery tools: “Enhance indi­vid­ual and orga­ni­za­tional abil­ity to real­ize novel insights by aug­ment­ing and accel­er­at­ing human sense mak­ing to allow engage­ment with all types of data at all use­ful scales.”

Dis­cov­ery envi­ron­ments: “Enable orga­ni­za­tions to under­take effec­tive dis­cov­ery efforts for all busi­ness pur­poses and per­spec­tives, in an empir­i­cal and coöper­a­tive fashion.”

Note: applic­a­bil­ity to a world of Big data is assumed — thus the refs to all scales / types / sources — rather than stated explic­itly.  I like that Big Data doesn’t have to be writ­ten into this core set of def­i­n­i­tions, b/c I think it’s a tran­si­tional label — the new ver­sion of Web 2.0 — and goes away over time.

Ref­er­ences and Resources:

Comment » | Language of Discovery

Discovery and the Age of Insight

August 21st, 2013 — 1:06pm

Sev­eral weeks ago, I was invited to speak to an audi­ence of IT and busi­ness lead­ers at Wal­mart about the Lan­guage of Dis­cov­ery.   Every pre­sen­ta­tion is a feed­back oppor­tu­nity as much as a chance to broad­cast our lat­est think­ing (a tenet of what I call lean strat­egy prac­tice — musi­cians call it try­ing out new mate­r­ial), so I make a point to share evolv­ing ideas and syn­the­size what we’ve learned since the last instance of pub­lic dialog.

For the audi­ence at Wal­mart, as part of the broader fram­ing for the Age of Insight, I took the oppor­tu­nity to share find­ings from some of the recent research we’ve done on Data Sci­ence (that’s right, we’re study­ing data sci­ence).  We’ve engaged con­sis­tently with data sci­ence prac­ti­tion­ers for sev­eral years now (some of the field’s lead­ers are alumni of Endeca), as part of our ongo­ing effort to under­stand the chang­ing nature of ana­lyt­i­cal and sense mak­ing activ­i­ties, the peo­ple under­tak­ing them, and the con­texts in which they take place.  We’ve seen the dis­ci­pline emerge from an eso­teric spe­cialty into full main­stream vis­i­bil­ity for the busi­ness com­mu­nity.  Inter­pret­ing what we’ve learned about data sci­ence through a struc­tural and his­toric per­spec­tive lead me to draw a broad par­al­lel between data sci­ence now and nat­ural phi­los­o­phy at its early stages of evolution.

We also shared some excit­ing new mod­els for enter­prise infor­ma­tion engage­ment; craft­ing sce­nar­ios using the lan­guage of dis­cov­ery to describe infor­ma­tion needs and activ­ity at the level of dis­cov­ery archi­tec­ture, IT port­fo­lio plan­ning,  and knowl­edge man­age­ment (which cor­re­spond to UX, tech­nol­ogy, and busi­ness per­spec­tives as applied to larger scales and via busi­ness dia­log) — demon­strat­ing the ver­sa­til­ity of the lan­guage as a source of link­age across sep­a­rate disciplines.

But the pri­mary mes­sage I wanted to share is that dis­cov­ery is the most impor­tant orga­ni­za­tional capa­bil­ity for the era.  More on this in fol­low up post­ings that focus on smaller chunks of the think­ing encap­su­lated in the full deck of slides.

Dis­cov­ery and the Age of Insight: Wal­mart EIM Open House 2013 from Joe Laman­tia

Comment » | Language of Discovery

Big Data Is Not the Insight: Slides From Enterprise Search Europe

May 21st, 2013 — 12:42pm

Slides from my talk Big Data Is Not the Insight: The Lan­guage of Dis­cov­ery at Enter­prise Search Europe in Lon­don last week are avail­able for view­ing and down­load from slideshare. The con­fer­ence was a good gath­er­ing of lead­ing per­spec­tives on search in Europe, def­i­nitely one I’d look for­ward to attend­ing again. And of course Lon­don is lovely in May, even when it feels more like win­ter than spring…

Big Data Is Not the Insight: The Lan­guage Of Dis­cov­ery: from Joe Laman­tia

Comment » | Language of Discovery, User Experience (UX), User Research

"Meet the Speaker" Interview for Enterprise Search Europe

April 30th, 2013 — 10:12am

I did a mod­est ‘meet the speaker’ inter­view with the orga­niz­ers of next month’s Enter­prise Search Europe con­fer­ence in Lon­don — it’s pub­lished here:

Here’s an excerpt:

It turns out that you can use a very sim­ple vocab­u­lary to spot and describe com­plex pat­terns in search­ing and sense mak­ing behav­iour, pat­terns that tran­scend typ­i­cal bound­aries like domain of use. You can also design solu­tions using the vocab­u­lary; from the inter­ac­tion design of work­spaces, to the var­i­ous data mod­els and infor­ma­tion struc­tures that under­lie your sys­tem. This com­bi­na­tion of ana­lyt­i­cal and gen­er­a­tive uses is unusual, and I wanted to share it.

And if you’re in the neigh­bor­hood of Lon­don May 13 — 17, and would like to con­nect to talk about search, dis­cov­ery, or related top­ics, ping me — I’d like to meet up with some new folks for my first visit to Lon­don in a few years.

Comment » | Language of Discovery

UX Australia Recording Available

February 19th, 2013 — 11:38am

The audio record­ing of my pre­sen­ta­tion Design­ing Big Data Inter­ac­tions in the Age of Insight from UX Aus­tralia 2012 was just published.

It’s avail­able for direct down­load from the ses­sion page, in the iTunes store, and as part of the pod­cast series for all the ses­sions at UX Australia.

Comment » | Language of Discovery, User Experience (UX), User Research

Back to top