我正在学习如何在executor.map()和executor.submit()中使用并发。
我有一个包含20个url的列表,并希望同时发送20个请求,问题是.submit()返回结果的顺序不同于从一开始给定的列表。我读到过map()可以满足我的需求,但我不知道如何用它来编写代码。
下面的代码对我来说工作得很好。
问:有没有等同于下面代码的map()代码块,或者可以按照给定列表的顺序对submit()中的结果列表进行排序的排序方法?
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))发布于 2020-08-21 06:37:21
在不使用map方法的情况下,您可以使用enumerate构建future_to_url字典,不仅将URL作为值,还将它们在列表中的索引包含在内。然后,您可以使用索引作为键,从调用concurrent.futures.as_completed(future_to_url)返回的future对象构建字典,以便在字典的长度上迭代索引,以便以与原始列表中的相应项相同的顺序读取字典:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {
executor.submit(load_url, url, 60): (i, url) for i, url in enumerate(URLS)
}
futures = {}
for future in concurrent.futures.as_completed(future_to_url):
i, url = future_to_url[future]
futures[i] = url, future
for i in range(len(futures)):
url, future = futures[i]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))发布于 2020-08-21 05:24:24
这是您现有代码的地图版本。请注意,回调现在接受一个元组作为参数。我添加了try,但在回调中除外,这样结果就不会抛出错误。根据输入列表对结果进行排序。
from concurrent.futures import ThreadPoolExecutor
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://www.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(tt): # (url,timeout)
url, timeout = tt
try:
with urllib.request.urlopen(url, timeout=timeout) as conn:
return (url, conn.read())
except Exception as ex:
print("Error:", url, ex)
return(url,"") # error, return empty string
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(load_url, [(u,60) for u in URLS]) # pass url and timeout as tuple to callback
executor.shutdown(wait=True) # wait for all complete
print("Results:")
for r in results: # ordered results, will throw exception here if not handled in callback
print(' %r page is %d bytes' % (r[0], len(r[1])))输出
Error: http://www.wsj.com/ HTTP Error 404: Not Found
Results:
'http://www.foxnews.com/' page is 320028 bytes
'http://www.cnn.com/' page is 1144916 bytes
'http://www.wsj.com/' page is 0 bytes
'http://www.bbc.co.uk/' page is 279418 bytes
'http://some-made-up-domain.com/' page is 64668 byteshttps://stackoverflow.com/questions/63512329
复制相似问题