Python爬虫相关】如何让导出的TXT文件包含遍历过的所有内容
课程设计
1
问题遇到的现象和发生背景
导出的TXT文件不包含所有页面的内容,只有最后一个页面的内容
import requests
from bs4 import BeautifulSoup
for i in range(5):
url ='http://spiderbuf.cn/trainingbox?level=4&pageno='+str(i+1)
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
dataSource =requests.get(url,headers=headers)
dataText = dataSource.text
soup = BeautifulSoup(dataText,'lxml')
with open('03.txt','w',encoding='utf-8') as fp:
for trs in soup.select('tr'):
tds=trs.select('td')
s=''
for td in tds:
td_text=td.get_text()
s+=str(td_text)+'|'
print(s)
fp.write(s + '\n')
fp.close()
运行结果及报错内容
打印是能打印出所有页面结果的,但文件是没有包含所有页面结果的,只有第五页也就是最后一页的结果
-
调整下层次试试
import requests from bs4 import BeautifulSoup with open('03.txt', 'w', encoding='utf-8') as fp: for i in range(5): url = 'http://spiderbuf.cn/trainingbox?level=4&pageno=' + str(i + 1) headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'} dataSource = requests.get(url, headers=headers) dataText = dataSource.text soup = BeautifulSoup(dataText, 'lxml') for trs in soup.select('tr'): tds = trs.select('td') s = '' for td in tds: td_text = td.get_text() s += str(td_text) + '|' print(s) fp.write(s + '\n')
-
with open('03.txt','w',encoding='utf-8') as fp: 改为: with open('03.txt','a',encoding='utf-8') as fp: 'w' 会覆盖掉 a 追加
发表回复