UC Santa Cruz has launched a new data science research center, Data, Discovery, and Decisions (D3). Led by Lise Getoor, professor of computer science in UCSC's Baskin School of Engineering, D3 provides a platform for collaboration between industry and academia in the emerging field of data science.
The ability to collect and analyze vast amounts of data has driven the emergence of data science as a new discipline. The Baskin School of Engineering has identified data science as a key focus area for the school.
"The establishment of the D3 Research Center within the Baskin School of Engineering will support our growing activities in data-driven discovery and decision making," said engineering dean Alexander Wolf. "It will provide an infrastructure for researchers in industry and academia to exchange ideas and develop practical solutions to data science challenges."
A central aim of D3 is to develop open-source tools to address the challenges and opportunities presented by the growing scale and heterogeneity of modern data sets.
"There is a lot of commonality in the problems and methods of dealing with heterogeneous, richly structured data, whether that data is from the Internet of Things, or information integration, or the modeling of socio-behavioral interactions," said Getoor, who chairs the Computing Research Association's Committee on Data Science.
The structure of D3 is modeled after the National Science Foundation's Industry/University Cooperative Research Centers, such as the Center for Research in Storage Systems at UC Santa Cruz. Industry partners will provide input on current industry research needs through participation in research panels, data sharing, and other types of collaboration. "We will work together to identify emerging challenges in data science and develop the tools needed to address them," Getoor said.
The founding members of D3 include Diffbot, which offers automatic web-crawling and bulk data processing capabilities to allow applications to leverage previously unstructured web data; Drawbridge, an identity management company that enables brands and enterprises to create personalized consumer experiences; and OmnyIQ, which provides services for device manufacturers to monitor, troubleshoot, and automatically repair connected devices. Supporting contributions have also come from Bosch, Glassbeam, and VMWare.
"We know academic researchers are spending significant time tackling similar problems that we have been studying, and it's important to have a forum to exchange and cross-examine ideas and approaches," said Drawbridge CTO Devin Guan, who chairs the D3 industry advisory board. "Bridging academia and industry will encourage and expedite innovation in this field, which is beneficial for everyone."
Each of the member companies brings a wealth of expertise in the area of data science, Getoor noted. "Bringing together their perspectives in a collaborative environment gives us the ability to see the commonalities and build widely applicable tools, rather than solving individual problems," Getoor said.
Abel Rodriguez, professor of applied mathematics and statistics, will serve as associate director of D3. Other UCSC faculty members affiliated with the center include computer scientists Peter Alvaro, Seshadhri Comandur, Abhradeep Guha Thakurta, SVN Vishwanathan, and Marilyn Walker, and technology management professor Yi Zhang.
As an example of the type of open-source tools she envisions being developed at D3, Getoor cited Probabilistic Soft Logic, a machine learning framework her research group has developed for building probabilistic models over richly structured graph and network data.
Getoor will be speaking on "Data science: A collaboration of statistics and computer science" at the Joint Statistical Meetings (JSM2017) in Baltimore on July 31, joining Robert Tibshirani of Stanford University in a special session sponsored by the Association for Computing Machinery (ACM).