Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
494 views
in Technique[技术] by (71.8m points)

probability - In Statistics, a Sample is a a Single Data Point, or a Pool of Data Points?

This question has confused me a lot in statistics. I think in Statistics, a sample is a pool of data points from the PDF, rather than a single data point, am I correct? In everyday language if you sample something, for example taking a sample from a candy jar, it means just taking a single candy. But in Statistics, it seems that a sample is always like repeating the action of taking the candy 20 times (meaning a sample of size 20). Am I correct?

Also, how do I interpret that sampling in random variable context? Going to that candy jar example, are the candies that I take by repeating the event of taking a candy, each an independent random variable?

question from:https://stackoverflow.com/questions/66050615/in-statistics-a-sample-is-a-a-single-data-point-or-a-pool-of-data-points

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

That's a good question with potentially deep ramifications. To clarify the matter I believe it's a good idea to separate the concepts well. The most simple way I can summarize it is the following:

Sampling: It is the research area of methods and processes of selecting a subset (a sample) of sample units from within a population to estimate characteristics of the whole population.

Population: This is the entire group of interest that we desire to have information about. Examples: (i) all the emperor penguin in Antarctica, (ii) all married women in the USA, (iii) all children up to 10 years old in the world.

Notice how tricky it can get: your population can be a subset of another population - "all married women in the USA" is a subset of "all women in the world". The secret is to keep in mind that a population is the entire group of your interest in a given situation.

Population Characteristics: This is the aspect of the population that you desire to measure. Examples: (i) the average height of emperor penguin in Antarctica, (ii) the average age of married women in the USA, (iii) the proportion of diabetic children up to 10 years old in the world.

Sample: A group formed by a subset of the population. You could sample from 1 to N sample units (see below) from your desired population.

Sample Unit: Must be defined according to the interest of the study, it could an individual, a family, a nation, etc. The choice must be made at the beginning of the study.

In your question I believe we just have to separate the noun sample from the verb to sample to make things clear.

You could correctly say:

  • "I am sampling candy from a jar".
  • "I have a sample of candy of size 1".
  • "I have a sample of candy of size 25".
  • "I must have a sample of size 30".

I believe there's another tangent matter here regarding a Classical Statistics concept that falls beyond the scope of the question, that is statical significance - you probably desire to have a sufficient large sample size to infer information about the population of interest - that might be the reason for people believing that there's no such thing as sample size equals one, but keep in mind that some subjects involve the analysis of rare events, in those cases your sample size will be small anyway.

Last but not least, about sampling in a random variable context the most accurate answer is: it depends. Using your example: Suppose your candy jar has only green and red candies. We could define a random variable X that is 0 if a candy sampled is green or 1 otherwise. But we could also define a random variable Y that is the summation of green candies sampled in a scoop that always take 10 candies on each try. For X or Y we could consider scenarios with or without reposition, we could be interested in a variable Z = g(X,Y), and so on - Independence between sample units can vary according to your population and "process" of interest.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...