# [2017-07-05] Prof. William Cleveland, Purdue University, " Data Science: Divide and Recombine (D&R)”

**Title:**Data Science: Divide and Recombine (D&R)

**Date:**2017-07-08 10:00am-11:00am

**Location:**系館2樓, CSIE

**Speaker:**Prof. William Cleveland, Purdue University.

**Hosted by:**Prof. Shih-wei Liao

**Abstract**

Divide & Recombine with DeltaRho for Big Data and High Computational Complexity,Illustrated by Spamhaus Blacklist Data Computational performance is challenging today. Datasets can be big,computational complexity of analytic methods can be high, and computer hardware power can be limited. Small datasets can be challenging, too, when the computations have high complexity. Divide & Recombine (D&R) is a statistical approach to meet the challenges.

In D&R, the analyst divides the data into subsets by a D&R division

D&R computation is mostly embarrassingly parallel, the simplest parallel

D&R with Deltarho provides deep analysis of datasets big in size and/or with

**Biography**

William S. Cleveland is the Shanti S. Gupta Distinguished Professor of

In the course of this work, Cleveland has developed many new methods and models for data that are widely used throughout the worldwide technical community. He has led teams developing software to implement his methods that have become core programs in many commercial and open-source systems. Today, Cleveland and colleagues develop the Divide & Recombine approach for data big in size and for high computational complexity of analytic methods. Each analytic method is applied independently to each subset in a divisionof the data into subsets. Then outputs are recombined. This enables a data analyst to carry out detailed, comprehensive analysis of big data, to

In 2016 Cleveland received the Lifetime Achievement Award for Graphics and Computing from the American Statistical Association, the first since 2010. In 2016 he also received the Parzen Prize from Texas A&M University, given every two years since 1994 to a "statistician whose outstanding research contributions include innovations that have had impact on practice". In 1996 Cleveland was chosen national Statistician of the Year by the Chicago Chapter of the American Statistical Association. In 2002 he was selected as a Highly Cited Researcher by the American Society for Information Science & Technology in the newly formed mathematics category. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, the American Association of the Advancement of Science, and the International Statistical Institute.