文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用lxml python获取textarea值

问如何使用lxml python获取textarea值
EN

Stack Overflow用户

提问于 2015-04-08 18:24:08

回答 3查看 1K关注 0票数 2

使用这段python代码，我可以获得完整的html源代码。

import mechanize
import lxml.html
import StringIO

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
sign_in = br.open("http://target.co.uk")
#the login url
br.select_form(nr = 0) 
#accessing form by their index.
#Since we have only one form in this example, nr =0.
br.select_form(nr=0)
#Alternatively you may use this instead of the above line 
#if your form has name attribute available.
br["username"] = "myusername"
#the key "username" is the variable that takes the username/email value
br["password"] = "myp4sw0rd"
#the key "password" is the variable that takes the password value
logged_in = br.submit()   
#submitting the login credentials
logincheck = logged_in.read()
#reading the page body that is redirected after successful login
if "logout" in logincheck:
    print "Login success, you just logged in."
else:
    print "Login failed"
#printing the body of the redirected url after login
coding1_content = br.open("https://www.target.co.uk/levels/coding/1").read() 
#accessing other url(s) after login is done this way


tree = lxml.html.parse(io.StringIO(coding1_content)

for ta in tree.findall("//textarea"):
    if not ta.get("name"):
        print(ta.text)

if "textarea" in coding1_content:
    print "Textarea found."
else:
    print "Textarea not found."

但我需要的是获得第一个没有名字的textarea标签的内容，我的HTML源如下

........
........
<textarea>this, is, what, i, want</textarea>
<textarea name="answer">i don't need it</textarea>
........
........

任何帮助我们都将不胜感激。

python

lxml

lxml.html

回答 3

Stack Overflow用户

发布于 2015-04-08 18:36:48

根据lxml文档，您可以通过访问forms属性来访问html对象的表单：

form_page = fromstring('''some html code with a <form>''')
form = form_page.forms[0] # to get the first form
form.fields # these are the fields

单击此处查看更多信息：http://lxml.de/lxmlhtml.html ->表单

票数 1

Stack Overflow用户

发布于 2015-04-08 18:31:07

如果HTML是

<html>
  <body>
    <form>
      <textarea>this, is, what, i, want</textarea>
      <textarea name="answer">i don't need it</textarea>
    </form>
  </body>
</html>

您可以像这样获取textarea内容：

import io
import lxml.html

html = "..."
tree = lxml.html.parse(io.StringIO(html)
for ta in tree.findall("//textarea"):
    if not ta.get("name"):
        print(ta.text)

输出：

this, is, what, i, want

票数 0

Stack Overflow用户

发布于 2015-04-08 19:35:43

另一种可能的方法是获取所有不具有HTML属性name的<textarea>，即使用xpath()方法：

.....
for t in tree.xpath(".//textarea[not(@name)]"):
    print t.text

虽然findall()只支持XPath语言的子集，但xpath()完全支持XPath 1.0。例如，如本例所示，xpath()支持not()，而findall()不支持。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29512047

复制

相似问题

问如何使用lxml python获取textarea值
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用lxml python获取textarea值EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用lxml python获取textarea值
EN