请教 Python 多线程内存不释放怎么排查

讨论 沉沉浮浮
Lv2 初级炼丹师
发布在 Python编程   1316   0
讨论 沉沉浮浮   1316   0

    新手写了一个多线程的爬虫,所有线程都执行完了,但是一直占着 1.5GB 的内存(任务数越多不释放的内存越多) 不知道怎么排查哪里出问题,pympler 看不太懂问题到底出在哪里,请教该如何正确的排查问题

    执行多线程函数的代码:

        def mainfunc(tasknum, thread):
            tr = tracker.SummaryTracker()
            tr.print_diff()
            list = []
            for i in range(tasknum):
                list.append(str(i))
            pool = threadpool.ThreadPool(thread)
            requests = threadpool.makeRequests(childfunc, list)
            for req in requests:
                pool.putRequest(req)
            pool.wait()
            tr.print_diff()

    tr.print_diff()打印的内容

    初始化:

    
                         types |   # objects |   total size
    ========================== | =========== | ============
                          list |        3741 |    350.84 KB
                           str |        3739 |    260.01 KB
                           int |         673 |     18.40 KB
                          dict |           2 |    352     B
                         tuple |           4 |    256     B
                          code |           1 |    144     B
         function (store_info) |           1 |    136     B
                          cell |           2 |     96     B
      functools._lru_list_elem |           1 |     80     B
                        method |          -1 |    -64     B

    所有线程结束后:

    
                                    types |   # objects |   total size
    ===================================== | =========== | ============
                                     dict |      202860 |     43.69 MB
                                     list |      100169 |      8.47 MB
                                      str |      102446 |      5.62 MB
                   threadpool.WorkRequest |      100000 |      5.34 MB
                                      int |      100836 |      3.08 MB
                       _io.BufferedReader |         294 |      2.35 MB
                                    tuple |        1480 |     93.30 KB
                                     type |          76 |     85.98 KB
                                     code |         572 |     80.57 KB
                                    bytes |        1219 |     51.49 KB
                                      set |          32 |     43.50 KB
                            socket.socket |         294 |     27.56 KB
           pymysql.connections.Connection |         294 |     16.08 KB
                          socket.SocketIO |         294 |     16.08 KB
      DBUtils.SteadyDB.SteadyDBConnection |         294 |     16.08 KB

    附上可以复现问题的最小化代码,执行完输出done后,htop显示python3一直占用着那一部分内存,除非kill掉否则不释放(发不了链接base64编码了一下)

    #!/usr/bin/pyyhon
    # -*- coding: UTF-8 -*-
    import threadpool, time, requests, base64
    
    s = requests.Session()
    
    def childfunc(id):
        url = base64.b64decode('aHR0cHM6Ly91cGxvYWQud2lraW1lZGlhLm9yZy93aWtpcGVkaWEvY29tbW9ucy9mL2ZmL1BpemlnYW5pXzEzNjdfQ2hhcnRfMTBNQi5qcGc=')
        res = s.get(url, timeout=(5, 60))
    
    def mainfunc(tasknum, thread):
        list = []
        for i in range(tasknum):
            list.append(str(i))
        pool = threadpool.ThreadPool(thread)
        requests = threadpool.makeRequests(childfunc, list)
        for req in requests:
            pool.putRequest(req)
        pool.wait()
        print('done')
        while True:
            time.sleep(1)
    
    if __name__ == '__main__':
        mainfunc(10000, 50)
    版权声明:作者保留权利,不代表意本站立场。如需转载请联系本站以及作者。

    参与讨论

    回复《 请教 Python 多线程内存不释放怎么排查

    EditorJs 编辑器

    沙发,很寂寞~
    反馈
    to-top--btn