

Anyone wanna help me learn Python? - Random_Person

Well, I've wanted to get back into programming for years, but just haven't found a need.  I've been playing with Java and Python tutorials for a few months but haven't ventured out and written any real code yet.<p>Today, an opportunity presented itself for me to write what I assumed would be a quick little script, yet it has turned into a few hours of frustration.  This is something I used to be able to do in Basic/VB in a few minutes, but I can't even get Python to open a file.<p>What I've got:
large plain txt document
space delimited<p>What I need:
Import the text, line for line, and output to another file with HTML tags.<p>Simple, right?
I understand the theory of how to do this, but can't get it to work.<p>Open the original file as read.  Open the output file as append.  I can't even get past that basic concept.<p>Then, for loop for each line of original?
make string variable for line
w.write to output file with my HTML code and text from original file?<p>I believe I'm on the right track, but have no idea how to write this dang thing.  Documentation isn't helping as I can't even seem to get the dang open command to work.<p>Any help?
======
jiaaro
opening and reading a file:

    
    
        in_file = open("/path/to/file.txt", mode="r")
        out_file = open("/path/to/file.html", mode="w")
            
        out_file.write("<html><head><title>my doc</title></head><body>")
        
        for line in in_file.readlines():
            line = "<p>" + line + "</p>
            out_file.write(line)
            
        out_file.write("</body></html>")
    
        in_file.close()
        out_file.close()
    

a few tips:

getting the attributes of anything at all: dir()

using the file object above:

    
    
        >>> dir(in_file)
        ... 
        ['__class__',
         '__delattr__',
         '__doc__',
         '__enter__',
         '__exit__',
         '__format__',
         '__getattribute__',
         '__hash__',
         '__init__',
         '__iter__',
         '__new__',
         '__reduce__',
         '__reduce_ex__',
         '__repr__',
         '__setattr__',
         '__sizeof__',
         '__str__',
         '__subclasshook__',
         'close',
         'closed',
         'encoding',
         'errors',
         'fileno',
         'flush',
         'isatty',
         'mode',
         'name',
         'newlines',
         'next',
         'read',
         'readinto',
         'readline',
         'readlines',
         'seek',
         'softspace',
         'tell',
         'truncate',
         'write',
         'writelines',
         'xreadlines']
    

getting the doc string: help()

    
    
        >>> help(in_file.write)
        ... Help on built-in function write:
    
        write(...)
            write(str) -> None.  Write string str to file.
    
            Note that due to buffering, flush() or close() may be needed before
            the file on disk reflects the data written.

~~~
Random_Person
Thanks! I figured out the open problem.. which was some sort of unicode issue
where I had the file saved in c:\Users\\.. and I guess when it hit the \U it
generated an escape. that's a pretty weird function. So, moved the file and
renamed and fixed that issue.

So, on my own, I had gotten as far as starting the for loop and I was unaware
that 'line' actually gave me a workable string. What I was working on was
this:

    
    
      for line in in_file:
         tx = in_file.readlines()
    

because my output needs to be pieces of the line instead of the whole line at
a time-- so something like this:

    
    
      for line in in_file:
         tx = in_file.readlines()
         out_file.write("<tr bgcolor="006699">"
         out_file.write("<td>" + tx[1,10] + "</td>")
         out_file.write("<td>" + tx[12, 23] + "</td>")
    

and so on.

~~~
zephyrfalcon
This is more a question for StackOverflow, but here goes...

Since Windows paths use backslashes, you should either escape all of them (not
just the \U), or use raw strings:

    
    
      filename = "c:\\Users\\foo.txt"
      filename = r"c:\Users\foo.txt"
    

As it happens, Unix-style forward slashes usually work fine too on Windows:

    
    
      filename = "c:/Users/foo.txt"
    

This way, you don't have to rename your files and directories. :-)

Also, if the contents of the line are delimited by spaces, consider the
split() method:

    
    
      "a b c".split()
      => ["a", "b", "c"]
    

If the format isn't actually as easy as that, take a look at the re and csv
modules.

~~~
Random_Person
Too awesome. As a linux user, I feel ashamed that I never tried to use forward
slashes in Windows.

I really like that split thing. I'll have to do some more research into it.
Figuring out how to do that and assign each split to its own variable and
all...

Thanks a ton!

------
Random_Person
Thanks a ton for the help guys! I have officially written my first Python code
to save me a day of typing and I can't thank you enough. Not only have you
saved my fingers, you have helped to spark the proverbial fire under my butt
with regards to learning Python. I've been removed from programming for so
long that I forgot how neat it was to write something and watch it go.

Here is the final script for anyone who cares. Can it be optimized?

    
    
      in_file = open('c:/WVASA/original.txt', 'r')
      out_file = open('c:/WVASA/output.txt', 'w')
    
      for line in in_file.readlines():
          county = line[:11].rstrip()
          email = line[12:42].rstrip()
          name = line[43:72].rstrip()
          street = line[73:98].rstrip()
          city = line[99:117].rstrip()
          zipcode = line[118:].rstrip()
        
          out_file.write('  <tr>bgcolor="#006699">\n')
          out_file.write('    <td>' + county + '</td>\n')
          out_file.write('    <td>' + name + '</td>\n')
          out_file.write('    <td>' + street + '</td>\n')
          out_file.write('    <td>' + city + '</td>\n')
          out_file.write('    <td>' + zipcode + '</td>\n')
          out_file.write('    <td><a href="mailto:' + email + '">' + email + '</a></td>\n')
          out_file.write('  </tr>\n')
      
      
      in_file.close()
      out_file.close()

~~~
Deejahll
\- readlines() is redundant for filehandle objects, you can omit it.

\- Prefer Python's string formatting operator. I even prefer the extended
dict-compatible syntax.

\- close() is called automatically when the filehandles are garbage-collected;
you can usually omit it for tasks like this.

\- ranges (like those in brackets) are exclusive, not inclusive. So I suspect
you have an off-by-one error there.

Thus:

    
    
        in_file = open('c:/WVASA/original.txt', 'r')
        out_file = open('c:/WVASA/output.txt', 'w')
    
        template = '''
            <tr bgcolor="#006699">
                <td>%(country)s</td>
                <td>%(name)s</td>
                <td>%(street)s</td>
                <td>%(city)s</td>
                <td>%(zipcode)s</td>
                <td><a href="mailto:%(email)s">%(email)s</a></td>
            </tr>'''
        template = template.strip()
    
        for line in in_file:
            out_file.write(template % {
                'county': line[:12].rstrip(),
                'email': line[12:43].rstrip(),
                'name': line[43:73].rstrip(),
                'street': line[73:99].rstrip(),
                'city': line[99:118].rstrip(),
                'zipcode': line[118:].rstrip(),
            })

~~~
Random_Person
As far as the ranges issue, that took some tweaking, but there was a good deal
of white space between columns, so I got it tweaked by trial and error. I had
no idea it was exclusive though. Is it exclusive on both ends?

I love the template thing. I've got to read up on that!

On close() - I found that if I didn't execute a close, and I opened the output
in a text editor and attempted to save it I got an error that it was being
accessed by another process.

I love this community, you guys are awesome. Thank you.

------
Random_Person
Alright, I'm so dang close. With just the issue of write. Here's what I've got
so far:

    
    
      in_file = open('c:\original.txt', 'r')
      out_file = open('c:\output.txt', 'w')
      
      for line in in_file.readlines():
          out_file.write('<tr> bgcolor="006699">')
          out_file.write('  <td>' + line[0:12] + '<td>')
      
      
      in_file.close()
      out_file.close()
    
    

There will be more text written per execution of the loop, but I can handle
that coding. :)

What I can't grasp is why write doesn't start a newline and what I can do to
remedy that. Currently this is just making a huge line of text instead of
individual lines.

Thanks a ton!

~~~
Random_Person
Ugh. Forget I wrote that. :) Amazing what a \n can do!

Okay, so, stripping spaces from string values. Easy? If I import my line, I
have 6 strings I need to segment out of it and my file uses spaces. So, in my
previous line:

    
    
      out_file.write('  <td>' + line[0:12] + '<td>\n')
    

I'd like to strip the spaces from line[0:12]

Thanks!

~~~
zephyrfalcon

      line[0:12].strip()
    

but see my other comment about str.split().

------
CyberFonic
Hang in there, it will be worthwhile!

I too had similar problems getting a hang of Python. "Dive into Python" was a
help. But once it 'clicked' it became hard to write anything that didn't work.
One thing that I still marvel is at, that you can often just make a guess and
it works! The collection of 'batteries' is very useful. Typing
help(some_function_or_class) at the command prompt is often very useful.

NB: if you are writing for Windows _and_ Linux and OS/X, then you should use
the functions in os.path module to avoid hard coding the directory and
filename conventions.

~~~
Random_Person
I played with Dive Into for a while. The problem has been that I just haven't
had anything I've needed to code for a long time. Being a once-programmer,
I've always arrogantly assumed that I could just jump back into coding, what
I'm finding though is that nearly 10 years off was just a little too much. :)

This was a one-off for that particular task, so no cross platform writing, but
I'm certainly use os.path in the future. Thanks for that!

