

Ask HN: Tips for parsing data in python?  - armenarmen

I'm a noobish programmer and I would appreciate any thoughts on how to go about parsing Aluminum prices with python.
======
canatan01
Either do it yourself using the urllib and re modules or use BeautifulSoup as
is already suggested. I would use urllib/re because that way you will learn
more about Python I think (and about reg.ex).

------
jnazario
pretty broad question. what format is your data in? structured or
unstructured?

structured would be, for example, XML or even CSV. unstructured would be on a
web page in HTML.

XML? use the python XML parser (i like the xml.dom.minidom parser myself,
plenty of good examples online). CSV? even easier, the csv module has good
documentation and examples.

HTML? trickier. i like BeautifulSoup still, although other people prefer other
tools for getting at the data.

care to elaborate?

~~~
armenarmen
I suppose it is super broad. I really have no idea to be honest, again I am
still quite new to coding.

I'll check out what you've recommended. I want to be pulling prices from here:
<http://www.metalprices.com/FreeSite/metals/al/al.asp>

~~~
jnazario
had a look. you'll want to parse those tables using a technique broadly known
as "screen scraping". varied libraries and tools exist in python.

basically walk the document and find the tables you wish to scrape, check the
table name (the red bar), then keep track of the values in some python data
structure (e.g. a list of dicts, list of lists, etc) and then access them that
way. that's the basic strategy.

