Estimation of experimental data redundancy and related statistics

I. Grabec

Igor Grabec update to 2007-10-10

https://arxiv.org/abs/0704.0162
Redundancy of experimental data is the basic statistic from which thecomplexity of a natural phenomenon and the proper number of experiments neededfor its exploration can be estimated. The redundancy is expressed by theentropy of information pertaining to the probability density function ofexperimental variables. Since the calculation of entropy is inconvenient due tointegration over a range of variables, an approximate expression for redundancyis derived that includes only a sum over the set of experimental data aboutthese variables. The approximation makes feasible an efficient estimation ofthe redundancy of data along with the related experimental information andinformation cost function. From the experimental information the complexity ofthe phenomenon can be simply estimated, while the proper number of experimentsneeded for its exploration can be determined from the minimum of the costfunction. The performance of the approximate estimation of these statistics isdemonstrated on two-dimensional normally distributed random data.

journal: None

category: physics.data-an physics.comp-ph

实验数据冗余估计及相关统计

I. Grabec

Igor Grabec update to 2007-10-10

https://arxiv.org/abs/0704.0162
实验数据的冗余度是估计自然现象复杂程度和探索所需实验次数的基本统计量。冗余由与实验变量的概率密度函数有关的信息的熵来表示。由于对一系列变量进行积分,熵的计算是不方便的,因此导出了冗余的近似表达式,其中仅包括关于这些变量的一组实验数据的总和。该近似使得对数据冗余以及相关实验信息和信息成本函数的有效估计成为可能。从实验信息可以简单地估计该现象的复杂性,而从代价函数的最小值可以确定其探索所需的适当实验次数。在二维正态分布随机数据上证明了这些统计量的近似估计的性能。

期刊参考: None

category: physics.data-an physics.comp-ph