3_R

hrafnulf13
Oct 1, 2020
2 min read

Updated: Oct 11, 2020

Differences in (Univariate) dataset and a (Univariate) frequency distribution

Dataset is a collection of data. While, a frequency distribution provides the information of the number of occurrences (frequency) of distinct values distributed within a given period of time/interval/category, in a list, table, or graphical representation [1-3].

A frequency distribution is constructed by grouping of the values in the dataset [1]. Usually, the procedure of construction starts from defining the intervals and following the computation of frequencies. Another of the way to construct a frequency distribution is to use relative frequency, thus creating a relative frequency distribution. Relative frequency is the fraction or proportion of times a value occurs. To find the relative frequencies, divide each frequency by the total number of data points in the sample. Relative frequencies can be written as fractions, percents, or decimals.

The guidelines for the constructions are [1]:

Each data value should fit into one interval only (intervals are mutually exclusive).
The intervals should be of equal size.
Intervals should not be open-ended.
Try to use between 5 and 20 Intervals.

Frequency distributions is often displayed in histograms. For example, the distribution of students' height within a class [1].

Given a distribution can we reconstruct the dataset? why?

Whenever the distribution is constructed, there is a loss of information. To be specific, the associations between the values and statistical units are lost. Thus, it is impossible to construct the original dataset given a distribution. However, for an univariative dataset given an univariative distribution, if the association is not important, that could be the case. In other situations, like multivariative datasets and distributions where each statistical unit has a unique combination of variables, it is impossible [4].

How would you describe the change of amount of information passing from the dataset to the distribution?

It can be described as grouping, frequency counting or categorization. For example in the case of frequency distribution, the values are categorized/grouped into respective intervals. As a result, only the number of occurences within a specific intervals are left.

References

https://courses.lumenlearning.com/boundless-statistics/chapter/frequency-distributions-for-quantitative-data/
https://en.wikipedia.org/wiki/Frequency_distribution
https://www.toppr.com/guides/maths/data-handling/data-and-its-frequency-distribution/
https://drive.google.com/file/d/1WkQVYbkofjAQlChoWbPstEUT9p_QcUrL/view

Statistics 2020-2021

MSc Cybersecurity, Sapienza University

3_R

Differences in (Univariate) dataset and a (Univariate) frequency distribution

Given a distribution can we reconstruct the dataset? why?

How would you describe the change of amount of information passing from the dataset to the distribution?

References

Recent Posts

Comments