3σ定律(three-sigma rule)/ 68–95–99.7原则
在统计上,68–95–99.7原则是在正态分布中,距平均值小于一个标准差、二个标准差、三个标准差以内的百分比,更精确的数字是68.27%、95.45%及99.73%。若用数学用语表示,其算式如下,其中X为正态分布随机变数的观测值,μ为分布的平均值,而σ为标准差:
在实验科学中有对应正态分布的三西格马定律(three-sigma rule of thumb),是一个简单的推论,内容是“几乎所有”的值都在平均值正负三个标准差的范围内,也就是在实验上可以将99.7%的机率视为“几乎一定”。不过上述推论是否有效,会视探讨领域中“显著”的定义而定,在不同领域,“显著”(significant)的定义也随着不同,例如在社会科学中,若置信区间是在正负二个标准差(95%)的范围,即可视为显著。但是在粒子物理中,若是发现(英语:Discovery (observation))新的粒子,置信区间要到正负五个标准差(99.99994%)的程度。
在不是正态分布的情形下,也有另一个对应的三西格马定律(three-sigma rule),即使是在非正态分布的情形下,至少会有88.8%的机率会在正负三个标准差的范围内,这是依照切比雪夫不等式的结果。若是单模分布(unimodal distributions)下,正负三个标准差内的机率至少有95%,若一些符合特定条件的分布,机率至少会到98% 。
最后贴一个Metis的代码实现,Metis是一个开源的对时间序列数据异常检测的一个工具。
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
Tencent is pleased to support the open source community by making Metis available.
Copyright (C) 2018 THL A29 Limited, a Tencent company. All rights reserved.
Licensed under the BSD 3-Clause License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://opensource.org/licenses/BSD-3-Clause
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
"""
import numpy as np
class Statistic(object):
"""
In statistics, the 68-95-99.7 rule is a shorthand used to remember the percentage of values
that lie within a band around the mean in a normal distribution with a width of two, four and
six standard deviations, respectively; more accurately, 68.27%, 95.45% and 99.73% of the values
lie within one, two and three standard deviations of the mean, respectively.
WIKIPEDIA: https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule
"""
def __init__(self, index=3):
"""
:param index: multiple of standard deviation
:param type: int or float
"""
self.index = index
def predict(self, X):
"""
Predict if a particular sample is an outlier or not.
:param X: the time series to detect of
:param type X: pandas.Series
:return: 1 denotes normal, 0 denotes abnormal
"""
if abs(X[-1] - np.mean(X[:-1])) > self.index * np.std(X[:-1]):
return 0
return 1
文字图片来自:http://www.chezaiyi.cn/psychology/320770.html
代码来自:https://github.com/Tencent/Metis
更多推荐
所有评论(0)