

Ask HN: How to run a rails web crawler on aws - wenqin123

Could someone be nice enough to point me in the right direction? I&#x27;ve got my web crawler working locally but now I want to put it onto an ec2 instance; however, I have no idea where to start.<p>Thanks!
======
pjungwir
Setting up Rails projects to run in production is my specialty. :-) Some
people love it, some hate it, but it's a good chance to learn some Linux and
sysadmin skills.

I'd advise you to do it sloppy the first time. Don't worry about configuration
management, containers, and all that stuff, just learn how to set up the
system. So things you'll have to do:

\- Set up a `deployer` user account. (Actually I like to name these after the
application they're responsible for.)

\- Install rbenv and your Ruby of choice.

\- Install nginx or Apache.

\- Install a Rails app server. I like Unicorn, but if you go with Phusion
Passenger there's not a separate process to manage---it just launches via your
web server.

\- Install your database and give it some initial contents.

\- Add Capistrano to your Gemfile, write a cap script, get it working.

\- Since you say this is a web crawler I assume background jobs are important.
So if you're using Resque or Sidekiq you'll need to install Redis.

For bonus points:

\- If you have cron jobs, use `whenever` so they get configured every time you
run `cap`.

\- Install SSL.

\- Install fail2ban.

\- Use unicorn instead of passenger.

\- Use a process manager like god (which is ruby-based) to control unicorn and
your background jobs. Btw if you do this, your cap tasks for restarting
unicorn, delayed_job, etc. should change to just `sudo god restart myapp-
unicorn`, not the direct commands.

\- Use chef-solo (also ruby-based) so you don't have to do it all by hand next
time. :-) Tip: to save time, run this against a vagrant VM until you get the
bugs out.

\- Start playing with more bits of the AWS ecosystem, e.g. add an ELB. Run
everything in a VPC instead of "EC2 Classic". Use CloudFormation to launch
instances and kick off Chef automatically (using chef-server instead of chef-
solo)---or try out OpsWorks, but IMO it's a bit beta. Use RDS.

Good luck, and have fun!

~~~
wenqin123
Thank you so much :)

