AttributeError: o objeto 'module' não tem atributo 'urlopen'

146

Estou tentando usar o Python para baixar o código-fonte HTML de um site, mas estou recebendo esse erro.

Traceback (most recent call last):  
    File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
     file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'

Estou seguindo o guia aqui: http://www.boddie.org.uk/python/HTML.html

import urllib

file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()

#I'm guessing this would output the html source code?
print(s)

Estou usando o Python 3.

python python-3.x urllib

— pppery
fonte

245

Isso funciona no Python 2.x.

Para Python 3, consulte os documentos :

import urllib.request

with urllib.request.urlopen("http://www.python.org") as url:
    s = url.read()
    # I'm guessing this would output the html source code ?
    print(s)

— eumiro
fonte

3

Oi Eumiro, usando a instrução 'with' no Python, acho que fecha a conexão automaticamente assim que terminar de usá-lo? Semelhante a uma instrução de uso em C #?

@Sergio: exatamente! E através do recuo, você vê onde seu arquivo ainda está aberto.

— eumiro

Olá @eumiro, tenho um erro de "IndentationError: espera um bloco indentado" quando digito s = url.read(), posso perguntar como posso resolvê-lo, por favor? x

— Karen Chan

@KarenChan está faltando um recuo antes s=url.read(); você tem 4 espaços antes?

— numbermaniac

19

Uma solução compatível com Python 2 + 3 é:

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen


# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
    s = url.read()

print(s)

— Martin Thoma
fonte

1

with urlopen("http://www.python.org") as url:não funciona em python2 com AttributeError: addinfourl instance has no attribute '__exit__'. Preciso escreverurl = urlopen("http://www.python.org")

— orshachar

15

import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)

No Python v3, o "urllib.request" é um módulo por si só, portanto, "urllib" não pode ser usado aqui.

— Manu Mariaraj
fonte

7

Para obter ' dataX = urllib.urlopen (url) .read () ' trabalhando no python 3 (isso estaria correto para o python 2 ), você deve alterar apenas duas pequenas coisas.

1: A própria instrução urllib (adicione o .request no meio):

dataX = urllib.request.urlopen(url).read()

2: A declaração de importação que a precede (mude de 'import urlib' para:

import urllib.request

E deve funcionar em python3 :)

— Steven B. Peutz
fonte

3

import urllib.request as ur

filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
    print(line.strip())

— Kamran
fonte

1

Para python 3, tente algo como isto:

import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")

Ele fará o download do vídeo no diretório de trabalho atual

Eu recebi ajuda AQUI

— rocksyne
fonte

1

Solução para python3:

from urllib.request import urlopen

url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)

— Banjali
fonte

Simples e fácil de entender para iniciantes. Obrigado

— SHR

1

Altere DUAS linhas:

import urllib.request #line1

#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2

Se você obteve o erro 403: exceção de erro proibido, tente o seguinte:

siteurl = "http://www.python.org"

req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()

Espero que o seu problema seja resolvido.

— Shahzaib Chadhar
fonte

0

Uma das maneiras possíveis de fazer isso:

import urllib
...

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.request import urlopen

— Vasyl Lyashkevych
fonte

0

Use seis módulos para tornar seu código compatível entre python2 e python3

urllib.request.urlopen("<your-url>")```

— Rajat Shukla
fonte

Você pode importar seis módulos dessa maneira a partir de six.moves import urllib

— Rajat Shukla

0

seu código usado no python2.x, você pode usar assim:

from urllib.request import urlopen
urlopen(url)

a propósito, sugira que outro módulo chamado requestsseja mais amigável de usar, você pode pipinstalá-lo e usar assim:

import requests
requests.get(url)
requests.post(url)

Achei que é fácil de usar, também sou iniciante .... hahah

— jason.lu
fonte

-1

import urllib
import urllib.request
from bs4 import BeautifulSoup


with urllib.request.urlopen("http://www.newegg.com/") as url:
    s = url.read()
    print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)

for links in all_tag_a:
    #print(links.get('href'))
    print(links)

— user11649630
fonte