Archive for October 2013

Understanding Data Science: Two Recent Studies

October 22nd, 2013 — 7:40am

If you need such a deeper under­stand­ing of data sci­ence than Drew Conway’s pop­u­lar venn dia­gram model, or Josh Wills’ tongue in cheek char­ac­ter­i­za­tion, “Data Sci­en­tist (n.): Per­son who is bet­ter at sta­tis­tics than any soft­ware engi­neer and bet­ter at soft­ware engi­neer­ing than any sta­tis­ti­cian.” two rel­a­tively recent stud­ies are worth reading.

Ana­lyz­ing the Ana­lyz­ers,’ an O’Reilly e-book by Har­lan Har­ris, Sean Patrick Mur­phy, and Marck Vais­man, sug­gests four dis­tinct types of data sci­en­tists — effec­tively per­sonas, in a design sense — based on analy­sis of self-identified skills among prac­ti­tion­ers.  The sce­nario for­mat dra­ma­tizes the dif­fer­ent per­sonas, mak­ing what could be a dry sta­tis­ti­cal read­out of sur­vey data more engag­ing.  The survey-only nature of the data,  the restric­tion of scope to just skills, and the sug­gested mod­els of skill-profiles makes this feel like the sort of exer­cise that data sci­en­tists under­take as an every day task; col­lect­ing data, ana­lyz­ing it using a mix of sta­tis­ti­cal tech­niques, and shar­ing the model that emerges from the data min­ing exer­cise.  That’s not an indict­ment, sim­ply an obser­va­tion about the con­sis­tent feel of the effort as a prod­uct of data sci­en­tists, about data science.

And the paper ‘Enter­prise Data Analy­sis and Visu­al­iza­tion: An Inter­view Study’ by researchers Sean Kan­del, Andreas Paepcke, Joseph Heller­stein, and Jef­fery Heer con­sid­ers data sci­ence within the larger con­text of indus­trial data analy­sis, exam­in­ing ana­lyt­i­cal work­flows, skills, and the chal­lenges com­mon to enter­prise analy­sis efforts, and iden­ti­fy­ing three arche­types of data sci­en­tist.  As an interview-based study, the data the researchers col­lected is richer, and there’s cor­re­spond­ingly greater depth in the syn­the­sis.  The scope of the study included a broader set of roles than data sci­en­tist (enter­prise ana­lysts) and involved ques­tions of work­flow and orga­ni­za­tional con­text for ana­lyt­i­cal efforts in gen­eral.  I’d sug­gest this is use­ful as a primer on ana­lyt­i­cal work and work­ers in enter­prise set­tings for those who need a base­line under­stand­ing; it also offers some gen­uinely inter­est­ing nuggets for those already famil­iar with dis­cov­ery work.

We’ve under­taken a con­sid­er­able amount of research into dis­cov­ery, ana­lyt­i­cal work/ers, and data sci­ence over the past three years — part of our pro­gram­matic approach to lay­ing a foun­da­tion for prod­uct strat­egy and high­light­ing inno­va­tion oppor­tu­ni­ties — and both stud­ies com­ple­ment and con­firm much of the direct research into data sci­ence that we con­ducted. There were a few impor­tant dif­fer­ences in our find­ings, which I’ll share and dis­cuss in upcom­ing posts.

Comment » | Language of Discovery, User Research

Defining Discovery: Core Concepts

October 18th, 2013 — 12:33pm

Dis­cov­ery tools have had a ref­er­ence­able work­ing def­i­n­i­tion since at least 2001, when Ben Shnei­der­man pub­lished ‘Invent­ing Dis­cov­ery Tools: Com­bin­ing Infor­ma­tion Visu­al­iza­tion with Data Min­ing’.  Dr. Shnei­der­man sug­gested the com­bi­na­tion of the two dis­tinct fields of data min­ing and infor­ma­tion visu­al­iza­tion could man­i­fest as new cat­e­gory of tools for dis­cov­ery, an under­stand­ing that remains essen­tially unal­tered over ten years later.  An indus­try ana­lyst report titled Visual Dis­cov­ery Tools: Mar­ket Seg­men­ta­tion and Prod­uct Posi­tion­ing from March of this year, for exam­ple, reads, “Visual dis­cov­ery tools are designed for visual data explo­ration, analy­sis and light­weight data mining.”

Tools should fol­low from the activ­i­ties peo­ple under­take (a foun­da­tional tenet of activ­ity cen­tered design), how­ever, and Dr. Shnei­der­man does not in fact describe or define dis­cov­ery activ­ity or capa­bil­ity. As I read it, dis­cov­ery is assumed to be the implied sum of the sep­a­rate fields of visu­al­iza­tion and data min­ing as they were then under­stood.  As a work­ing def­i­n­i­tion that cat­alyzes a field of prod­uct pro­to­typ­ing, it’s ade­quate in the short term.  In the long term, it makes the bound­aries of dis­cov­ery both derived and tem­po­rary, and leaves a sub­stan­tial gap in the land­scape of core con­cepts around dis­cov­ery, mak­ing con­sen­sus on the nature of most aspects of dis­cov­ery dif­fi­cult or impos­si­ble to reach.  I think this def­i­n­i­tional gap is a major rea­son that dis­cov­ery is still an ambigu­ous prod­uct landscape.

To help close that gap, I’m sug­gest­ing a few def­i­n­i­tions of four core aspects of dis­cov­ery.  These come out of our sus­tained research into dis­cov­ery needs and prac­tices, and have the goal of clar­i­fy­ing the rela­tion­ship between dis­cvo­ery and other ana­lyt­i­cal cat­e­gories.  They are sug­gested, but should be inter­nally coher­ent and consistent.

Dis­cov­ery activ­ity is: “Pur­pose­ful sense mak­ing activ­ity that intends to arrive at new insights and under­stand­ing through explo­ration and analy­sis (and for these we have spe­cific defin­tions as well) of all types and sources of data.”

Dis­cov­ery capa­bil­ity is: “The abil­ity of peo­ple and orga­ni­za­tions to pur­pose­fully real­ize valu­able insights that address the full spec­trum of busi­ness ques­tions and prob­lems by engag­ing effec­tively with all types and sources of data.”

Dis­cov­ery tools: “Enhance indi­vid­ual and orga­ni­za­tional abil­ity to real­ize novel insights by aug­ment­ing and accel­er­at­ing human sense mak­ing to allow engage­ment with all types of data at all use­ful scales.”

Dis­cov­ery envi­ron­ments: “Enable orga­ni­za­tions to under­take effec­tive dis­cov­ery efforts for all busi­ness pur­poses and per­spec­tives, in an empir­i­cal and coöper­a­tive fashion.”

Note: applic­a­bil­ity to a world of Big data is assumed — thus the refs to all scales / types / sources — rather than stated explic­itly.  I like that Big Data doesn’t have to be writ­ten into this core set of def­i­n­i­tions, b/c I think it’s a tran­si­tional label — the new ver­sion of Web 2.0 — and goes away over time.

Ref­er­ences and Resources:

Comment » | Language of Discovery

Back to top