Problem: Accurately Recovering Distribution
Problem: Accurately Recovering Distribution
- Summarizing the data with a single statistic is misleading.
- Even histograms can miss key features
How do we choose bucket boundaries in a histogram to minimize error?
(Must also be convenient: require no prior knowledge of distribution, have fixed storage and be computationally efficient.)