You shouldn't be generating the text in advance and then processing it. You should be dynamically generating the text in memory, so you basically only have to worry about the memory for one text file at a time.
As for visualizations, R and ggplot2 may work (R can handle text and data munging, as well as sentiment analysis etc.) It may be worth using it as a social scientist.
ggplot2 has a python port.
That said, you are probably using nltk, right? There are some tools in nltk.draw. There is probably also a user's mailing list for whatever package or tool you are using, consider asking this there.
I worked for an NLP research think tank for a while and we always created text files as intermediate steps to each part of our system. It was basically a cache of each step, and you could restart the system at whatever step did work.
Hard drive space is cheap. Use as much as you want.
Making it clear that he doesn't need to buy equipment is a good thing. I agree with you that logging results as you go is worthwhile, but for data munging, I think it's better to keep your data in it's original source, and document how you get your data into the system in code, and not require somebody reproducing your results to have a huge HD or buy something.
As an aside, I was also a social scientist originally. My first degree was in Psychology. The first time I felt like a programmer was taking supplied R code that would have taken 8+ days to finish (2400 Rausch scores at 5 minutes each), and got the whole thing to run in less than a minute by moving from sequential search of every possibility to a probing strategy to find the score that best fit the curve. Learning how to be more efficient in your code, to use less space, or time through a better algorithm to handle your data, is both useful in it's own right, and intellectually rewarding.
R is going to give you some headaches as it relies heavily on the local machine's memory. Using RStudio on a beefed up AWS instance might help make calculation time a bit more palatable.
You shouldn't be generating the text in advance and then processing it. You should be dynamically generating the text in memory, so you basically only have to worry about the memory for one text file at a time.
As for visualizations, R and ggplot2 may work (R can handle text and data munging, as well as sentiment analysis etc.) It may be worth using it as a social scientist.
ggplot2 has a python port.
That said, you are probably using nltk, right? There are some tools in nltk.draw. There is probably also a user's mailing list for whatever package or tool you are using, consider asking this there.