python数据分析回归算法

2024-04-03 21:46•python•阅读 921

1，线性回归，多元回归，逻辑回归

　　回归即用一个函数探究数据之间的关系。线性回归指用线性函数的方式来研究变量之间关系。多元回归是指线性函数中变量有多个。逻辑回归是线性回归的拓展，数据分析中有两类问题：回归和分类。回归的问题采用回归的方法，分类的问题采用分类的方法。逻辑回归是用线性回归的方法来探究分类问题。

举一个例子：

探究房价跟房屋面积和楼层之间的关系：

"""
 面积     楼层     房价
 100       8       20000
 120       6       19000
 80       20       15000
 60       23       14000
 150      10       18000
 150      20       17000
 90       20       17000


"""

class House_Predict():
    def __init__(self):
        x=np.mat([[100,8],[120,6],[80,20],[60,23],[150,10],[150,20],[90,20]])
        self.x=np.hstack((x,np.ones((x.shape[0],1))))#添加常数项
        self.y=np.mat([[20000],[19000],[15000],[14000],[18000],[17000],[17000]])
        #根据多元回归公式，w=（xTx）-1xTy
        self.w=np.linalg.inv(self.x.T*self.x)*(self.x.T)*self.y
    def predict(self,data):
        return self.w.T*(np.mat(data).reshape(3,1))

    #评估一下数据方差
    def data_variance(self):
        sum=0
        for i in range(self.x.shape[0]):
            sum+= (self.x[[i],:]*self.w-self.y[[i],[0]])**2//self.x.shape[0]
        return sum

2.梯度下降法。

采用最小二乘法计算线性回归，是一种较方便的方法。在数据分析中，如果数据无法求导，则采用梯度下降法进行迭代求回归系数。其系数随着迭代次数的增加，逼近最小二乘法的系数。

# 梯度下降法
class House_Predict1():
    def __init__(self):
        x = np.mat([[100, 8], [120, 6], [80, 20], [60, 23], [150, 10], [150, 20], [90, 20]])
        self.x = np.hstack((x, np.ones((x.shape[0], 1))))  # 添加常数项
        self.y = np.mat([[20000], [19000], [15000], [14000], [18000], [17000], [17000]])
        self.error_list = []


    def gra_near(self):
        length = self.x.shape[0]
        thea0, thea1, thea2 = 0, 0, 0
        max_count = 500000     # 最大迭代次数
        error_near = 0.1  # 误差阈值
        grad = 0.0001  # 步长
        thea_array = np.mat([[thea1], [thea2], [thea0]]).astype(float)  # thea0为常数项
        error = 0
        for i in range(max_count):
            thea_array+= ((self.y - self.x * thea_array).T * self.x * grad / length).T
            error_temp = np.mean((self.y - self.x * thea_array).T * (self.y - self.x * thea_array))
            self.error_list.append(error_temp)
            if np.abs(error - error_temp) < error_near:#判定误差小于阈值，表示结束迭代
                break
            error = error_temp


    def plt_line(self):
        plt.plot(list(range(len(self.error_list))), self.error_list)
        plt.show()

上一篇 »使用python中的matplotlib进行绘图分析数据
下一篇 »【Python开发】使用python中的matplotlib进行绘图分析数据

python数据分析回归算法

相关推荐

python数据分析panda库

python处理数据的风骚操作[pandas 之 groupby&agg] Python random模块利用python进行数据分析之数据聚合和分组运算

python 数据分析--数据可视化工具matplotlib

4种方法教你利用Python发现数据的规律

python数据分析-04Nan的类型处理

python分析文本文件/json

信息领域热词分析系统--python切词

Python 数据分析：Pandas 缺省值的判断