Big Data News of the Week: Sexy and Social Data Scientists

English: Hal Varian (Photo credit: Wikipedia)

In a week when tech vendors traditionally go on a news hiatus, the people who actually do big data came to the fore. InformationWeek reported on the gathering of nearly three dozen data scientists from academia and business at the Chief Data Scientist Summit in Chicago. The sexiest part of performing the sexiest job of the 21 century, it turns out, is not the mastery of technology. "Engineering, I think you can pick up," said Scott Nicholson, chief data scientist at Accretive Health, "curiosity is built in." Catalin Ciobanu, senior manager-BI at Carlson Wagonlit Travel, expressed the same sentiment: "The thought process is the most important ingredient in data science."

Asking the right questions is probably the most important part of the thought process and it turns out you can run a company successfully as a data science experiment if you ask questions. Google’s Eric Schmidt (quoted by Bernard Marr): “We run the company by questions, not by answers. So in the strategy process we've so far formulated 30 questions that we have to answer […] You ask it as a question, rather than a pithy answer, and that stimulates conversation. Out of the conversation comes innovation.”

Google and other companies in its orbit have been getting help in their data science experiments from academics usually accused of being disconnected from the real world—economists. “A small group of the world’s top microeconomists are quietly revolutionising the discipline,” says The Economist. “Working for big technology firms such as Google, Microsoft and eBay, they are changing the way business decisions are made and markets work.” Two of the four data scientists mentioned by The Economist have left academia (Hal Varian and Preston McAfee, both at Google), while the other two have straddled both worlds (Steve Tadelis at eBay and Berkeley and Susan Athey at Microsoft and Stanford).

One aspect of their training people with PhDs (whether in economics, physics or other disciplines) have brought to their work as data scientists, is the practice of working in teams. Donnie Berkholz thinks being social is the sexiest part of being a data scientist: “We’re seeing the beginnings of bringing the collaboration models that have been vastly successful in open-source communities to data science… The future looks like this: The entire workflow from data to analysis to result to visualization will be social and collaborative." A by-product of this collaborative process, EMC Greenplum’s Scott Yara tells Dan Woods, is a “living data dictionary,” where “multiple, shared, agreed-upon analytical models” replace the “single version of the truth” that has always been the goal of managing data in enterprises.

But Woods also talked to Amit Bendov, CEO of SiSense, a company that wants to put data science in the hands of people without PhDs or even basic training in data science. Says Bendov: “A lot of people are changing their title, but they’re not really data scientists, and there’s a lot of talk about the skills shortage. There just aren’t enough of them.” Gartner’s Regina Casonato thinks that not only data scientists will be in short supply but also business analysts, data stewards, information architects, and data warehouse architects. In short, big data requires big investment in training and education, even if companies like SiSense will manage to simplify and automate data handling, warehousing, and analyzing.

Still, with all the talk about the skill-shortage, we’ve seen this week quite a few reports about successful application of data science:

Startup Opower collects data from more than 50 million homes, analyzes it and provides recommendations to utility customers on how to reduce their energy consumption; it says it will be able to save 2 terawatt hours of energy by the end of 2012. That amount of energy use is equivalent to 200,000 average U.S. homes a year, or $200 million in energy cost savings.
Globys is using big data for targeted marketing, telling telecommunications companies what is the right time to interact with their customers.
Aetna Health is using big data analysis to come up with personalized treatment plan that assesses patient risk factors and focuses on treating one or two things that will have the most impact (statistically speaking) on improving their health.
Scientists analyzed calls and texts involving almost 15 million Kenyan mobile phone subscribers, nearly 12,000 cell towers, and 692 different settlements and correlated that data with a map of malaria cases. On that basis, they were able to figure out the likelihood of infection for people passing through locations associated with the disease and provide recommendations preventing the spread of the disease.
The Frederick National Laboratory for cancer research (winner of the 2012 government big data solutions award) has been using big data solutions to support researchers working on understanding the relationship between genes and cancers. In a recent example, they have built a big data infrastructure capable of cross-referencing the relationships between 17,000 genes and five major cancer subtypes across 20 million biomedical publication abstracts. The result: understanding additional layers of the pathways these genes operate in and the drugs that target them.
IBM is testing a new traffic-management technology in a pilot program in Lyon, France, that’s designed to provide the city’s transportation engineers with “real-time decision support” so they can proactively reduce congestion. The technology uses IBM’s Data Expansion Algorithm to combine old and new data to predict future traffic flow. Over time the system “learns” from successful outcomes to fine-tune future recommendations.

Data science is everywhere and Barry Eggers of Lightspeed Venture Partners tells us that one major league team—“and likely more”—is evaluating a small Hadoop cluster. He believes that “it’s not hard to imagine a day where [baseball] managers… have their locker room data scientist run real-time, in-game analytics using technologies like Cassandra, Hbase, Drill, and Impala.”

All these real-world examples get investors' attention and we heard this week about two new funding rounds:

Always Prepped raised $650,000 from angel investors to create a Web site that aggregates information from different databases where teachers keep record of students’ attendance and academic performance. Says founder Fahad Hassan: “Data is everywhere. It exists. We’re just pulling it into one place and our goal is to make it consumable for teachers.” And Lattice Engines, a company helping sales reps be more productive by analyzing data to extract real-time insights about customer needs and behavior, has raised $20 million in equity funding from New Enterprise Associates, with participation from Sequoia Capital.

Finally, in the “what to look for” department, the influential data scientist Daniel Tunkelang has started to share on LinkedIn his vast experience with measuring influence, an age-old challenge for social data science, and Susan Fourtané provides a round-up of upcoming big data events.

More From Forbes

Big Data News of the Week: Sexy and Social Data Scientists