Hello! 我是小小,今天是本周的第三篇,对于第三篇来说,介绍一个爬数据的东东。这里以爬取抖音数据为例子。

前期准备

前期准备需要一部安卓手机。
下载相关软件,这里下载HttpCanary
抖音App

下载APP

在手机上打开网址 https://play.google.com/store/apps/details?id=com.guoshi.httpcanary&hl=zh&gl=US
如下图所示:
数据 | 爬一次给你10W你敢爬么!!!插图
下载相关的软件。并启动,如图所示
数据 | 爬一次给你10W你敢爬么!!!插图1

下载抖音,启动

数据 | 爬一次给你10W你敢爬么!!!插图2

教你抓包

首先科普一下啥事抓包,抓包是一种类似于中间人攻击的手段,使用抓包可以实现拦截并获取用户发送和接收的HTTP流量,可以用于进行数据的分析。
在安卓上抓包相当的简单,只需要点击下方的小飞机,即可。
数据 | 爬一次给你10W你敢爬么!!!插图3
此时可以看到已经开始抓包。

按下暂停,进入任意一个请求。
数据 | 爬一次给你10W你敢爬么!!!插图4
可以详细的看到相应的抓包内容。
在这就可以进行详细的分析啦。
在总览中,可以看到相应的请求的信息。
在请求中,可以看到相应的请求体,以及请求头部的详细信息。
在响应中,可以看到相应的响应部分。

开始抓取抖音用户搜索列表

这里开始抓取抖音的用户搜索列表。这里只抓取前一条。
首先,清空之前抓取的请求。
如图所示
数据 | 爬一次给你10W你敢爬么!!!插图5
单击按钮开始抓包。
数据 | 爬一次给你10W你敢爬么!!!插图6
然后快速的进入到抖音,搜索一个用户。
数据 | 爬一次给你10W你敢爬么!!!插图7
如上图所示。
然后停止进行抓包。
可以看到一共抓取了35条HTTP请求
数据 | 爬一次给你10W你敢爬么!!!插图8
然后,逐个分析,查看哪个是用户列表的HTTP请求

这里的分析的方法,常见的方法有,返回值法分析,根据返回值分析哪个是可能的请求,根据名称分析,例如搜索,那么url里一定有search,keyword相关的关键字。如果是点赞,那么一定有like相关的。但是不一定是绝对的。

这里分析结果如下,分析出来其中

https://aweme.snssdk.com/aweme/v1/search/sug/?keyword=%E5%B0%8F%E6%A9%99%E5%AD%90&source=user&from_group_id=6901212774668504323&os_api=23&device_type=MI+5s&ssmix=a&manifest_version_code=130701&dpi=240&uuid=910000000073543&app_name=aweme&version_name=13.7.0&ts=1606905781&cpu_support64=false&app_type=normal&appTheme=ddark&ac=wifi&host_abi=armeabi-v7a&update_version_code=13709900&channel=aweGW&_rticket=1606905782673&device_platform=android&iid=2497731620770349&version_code=130700&cdid=8ef1cc20-e0a7-478a-9193-4f396474f75e&openudid=ea10cded4241887b&device_id=69441294706&resolution=810*1440&os_version=6.0.1&language=zh&device_brand=Xiaomi&aid=1128

这个很长很长的连接就是用户列表链接。
可以看到在APP中的详细信息如下
数据 | 爬一次给你10W你敢爬么!!!插图9
根据这个信息,把这个信息导出,复制到电脑端的POSTMan中。如图所示
数据 | 爬一次给你10W你敢爬么!!!插图10
然后选择发送相关请求。可以看到请求已经发送。返回的结果已经出现。

{
    "sug_list": [
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子",
            "sug_type": "",
            "word_record": {
                "group_id": "6541999374455543053",
                "words_position": 0,
                "words_content": "小橙子",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.105454",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|origin_query|bg_search_after_read|aweme_index_query_word_shortterm|viking_recall|new_user_word",
                "rich_sug_type": "0",
                "score": "10033.483513"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子先生",
            "sug_type": "",
            "word_record": {
                "group_id": "6663331049126335757",
                "words_position": 1,
                "words_content": "小橙子先生",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.075498",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|aweme_orion_word|aweme_index_query_word_shortterm|new_user_word",
                "rich_sug_type": "0",
                "score": "35.694026"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子🍊",
            "sug_type": "",
            "word_record": {
                "group_id": "6605856988817528077",
                "words_position": 2,
                "words_content": "小橙子🍊",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.022446",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "new_user_word|aweme_index_query_word_shortterm",
                "rich_sug_type": "0",
                "score": "23.660895"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子向李尖尖道歉",
            "sug_type": "",
            "word_record": {
                "group_id": "6867526537433453828",
                "words_position": 3,
                "words_content": "小橙子向李尖尖道歉",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.010760",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
                "rich_sug_type": "0",
                "score": "15.679590"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子姐姐",
            "sug_type": "",
            "word_record": {
                "group_id": "6595874137606984963",
                "words_position": 4,
                "words_content": "小橙子姐姐",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.006576",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|aweme_orion_word|aweme_index_query_word_shortterm|new_user_word",
                "rich_sug_type": "0",
                "score": "21.280143"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子妲己视频",
            "sug_type": "",
            "word_record": {
                "group_id": "6733562976994874637",
                "words_position": 5,
                "words_content": "小橙子妲己视频",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.005108",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
                "rich_sug_type": "0",
                "score": "16.143941"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子2.0",
            "sug_type": "",
            "word_record": {
                "group_id": "6626934284127048967",
                "words_position": 6,
                "words_content": "小橙子2.0",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.001549",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "aweme_orion_word|new_user_word",
                "rich_sug_type": "0",
                "score": "16.700139"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子吖",
            "sug_type": "",
            "word_record": {
                "group_id": "6657385354091386119",
                "words_position": 7,
                "words_content": "小橙子吖",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.001493",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "new_user_word|aweme_index_query_word_shortterm",
                "rich_sug_type": "0",
                "score": "16.461999"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子摔下楼梯",
            "sug_type": "",
            "word_record": {
                "group_id": "6860077678121194759",
                "words_position": 8,
                "words_content": "小橙子摔下楼梯",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.001470",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
                "rich_sug_type": "0",
                "score": "15.075646"
            }
        },
        {
            "pos": [
                {
                    "begin": 0,
                    "end": 2
                }
            ],
            "content": "小橙子妈妈🍊",
            "sug_type": "",
            "word_record": {
                "group_id": "6741513233120630030",
                "words_position": 9,
                "words_content": "小橙子妈妈🍊",
                "words_source": "sug"
            },
            "extra_info": {
                "combine_utility": "0.001376",
                "is_rich_sug": "0",
                "latency": "52489",
                "recall_reason": "new_user_word",
                "rich_sug_type": "0",
                "score": "14.622774"
            }
        }
    ],
    "status_code": 0,
    "status_msg": "",
    "rid": "20201202202753010198065013351E671F",
    "words_query_record": {
        "info": "{}",
        "words_source": "sug",
        "query_id": ""
    },
    "extra": {
        "now": 1606912074000,
        "logid": "20201202202753010198065013351E671F",
        "fatal_item_ids": [],
        "search_request_id": ""
    },
    "log_pb": {
        "impr_id": "20201202202753010198065013351E671F"
    }
}

数据 | 爬一次给你10W你敢爬么!!!插图11
至此数据到手,至于生下来这么做,那就随你喽,你可以保存数据,可以进行数据分析,等等都可以的。

关于作者

我是小小,一枚程序猿,我们下期再见。双鱼座的哦~