`pdftext`, from http://www.foolabs.com/xpdf/
For OCR, `pdfimages` (also from xpdf), combined with ImageMagick's `convert`, and `tesseract` (http://code.google.com/p/tesseract-ocr/) works passably well.
1. Why return an array of texts? Where do the texts get split up? At page boundaries? Column boundaries? At the end of each line? If a line is interrupted by a corner of an image and continues a couple of inches afterward, does it get treated as a separate text? (I once used a PDF->text extractor program that spit out every word sepearately, often in an incorrect order. That probably had to do with how the PDF was organized internally.)
2. "The PDF file should be smaller than 1 Mbit" -> You mean 1 megabyte, right? Because 1 megabit is only 125-128 kilobytes.
2. You're right, I mean Megabyte
Since a lot of PDFs are badly organized (and I wonder if some programs deliberately do that to make text extraction difficult), perhaps you could try to analyze the location of each token on the page and merge the ones that seem to belong together. That would be already 100x better than most of the free PDF->text converters out there.
What if the document contains sensitive or privileged data?
If so, nice work!