Home

On Sep 20, 9:04 pm, crybaby <joemystery...@gmail.com> wrote:
> I need to traverse a html page with big table that has many row and
> columns. For example, how to go 35th td tag and do regex to retireve
> the content. After that is done, you move down to 15th td tag from
> 35th tag (35+15) and do regex to retrieve the content?

1) You can find your table using one of these methods:

a)
target_table = soup.find('table', id='car_parts')

b)
tables = soup.findall('table')
target_table = tables[2]

The tables are put in a list in the order that they appear on the
page.


2) You can get all the td's in the table using this statement:

all_tds = target_table.findall('td')


3) You can get the contents of the tags using these statements:

print all_tds[34].string
print all_tds[49].string


Here is an example:

from BeautifulSoup import BeautifulSoup

doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>

<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
</table>
</body>
</html>
"""

soup = BeautifulSoup(doc)

tables = soup.findAll('table')
target_table = tables[1]

all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[2].string

--output:--
hello
goddbye

previous
next

Re: Best way to capture output from an exec'ed (or such) script?
RE: howto kill a windows process by name ?
Re: Sets in Python
Re: reading a line in file
Re: Script to extract text from PDF files
Mam Marzenie
Dzieci Niczyje
Krwinka
Fundacja Avalon
Fundacja Sloneczko