

Ask HN: Techniques to go around IDS systems? - zaidf

For background about my site http://www.classhunt.com, read this: http://news.ycombinator.com/item?id=600478<p>Since I posted that, I had a meeting with the Vice Chancellor of IT. At the end, he basically said there is nothing he could do to keep my site functioning. I told him I will continue to find ways around them blocking me even though I don't enjoy playing cat and mouse.<p>Also, the site has become a huge hit amongst incoming freshmen. Back when I posted about it here, we had few hundred users. Today we have few thousand students that are relying on it. This has furthered my motivation to keep the site alive.<p>Here's the deal: the python bot that scrapes latest scheduling data off of uni's server keeps getting blocked by the uni's IDS system. I've put random delays and intervals but haven't had much luck. After a day or so, the IDS removes the block only to put it back later.<p>The IDS blocks my personal uni account, not the IP. When I first made the app the data was public. But then the uni, in an attempt to block the site, removed the data from being public. So I had to modify the POST string and go through my uni username/password to retrieve the data.<p>What I am doing right now:<p>- randomizing # of threads between 2-4<p>- randomizing the pause between new requests between 1-3 seconds<p>- randomizing how often my bot runs between 3-7 minutes<p>This is what I get when the IDS blocks it:
To protect the univeristy from a denial of service attack, script blockers have been enabled.
Your transaction has been blocked besause you have tried to login too many times in a short period of time.
Please wait and try again soon.<p>How do I fool this sucker?
======
Travis
If other students are relying on it, why not have them pass their credentials
into the app, then have your app use their login to acquire the data? If you
are secure about it, you should be able to avoid the uni's wrath while still
keeping your service alive.

~~~
zaidf
That's an option. Though asking users for their username/pass will cause
significant overhead on my part in terms of security as well as turn away many
of the users IMO.

------
zaidf
Now that I think about it, currently each time my bot starts(every few mins),
it logs in and gets a new sessionID which I then use to retrieve the data. I'm
going to take the error message to heart and save the session ID between bot
runs. Not getting a new sessionID should technically mean that I am not
logging in many times.

Will try and report back soon as they unblock me!

------
cperciva
_How do I fool this sucker?_

Don't. Attempting to circumvent access control measures could not get you
thrown in jail.

~~~
profquail
_Attempting to circumvent access control measures could not get you thrown in
jail._

If you don't have to worry about getting thrown in jail, I wonder why more
people aren't doing it? ;)

Anyway, to answer the OP's question...is there some combination of POST
options that loads the entire class schedule at once? I know my school's (U of
Alabama) system had a form where you could select all of the course
departments and load the entire schedule on one page.

Also, do you really need to reload the data so often? If the IT department is
worried about server load, perhaps you could arrange to get a data feed once
or twice per day (overnight would probably be best).

~~~
zaidf
Yeah, the function of the site is to alert you the moment a class opens up. So
I need to get the data as often as I can.

 _If the IT department is worried about server load, perhaps you could arrange
to get a data feed once or twice per day (overnight would probably be best)._

Tried that. All they would have to do is put all the data on one page instead
of me having to make 130 requests(one for each department). They refused
saying they don't have resources. I've tried to be as nice as I possibly can
be and they've been as dick-like as they can be.

My last resort would be to ask our user-base to call the IT office and voice
their support.

~~~
anamax
> My last resort would be to ask our user-base to call the IT office and voice
> their support.

Why isn't that your first resort?

~~~
HalcyonMuse
This seems like a great idea to me, as it could get you a job, and make your
service much faster, in that you could build your notification system into the
class scheduling system, which would mean you wouldn't need to poll their
system every 3-7 minutes - it could be event-driven instead. (Dropping a class
could trigger a check of who's watching the class and notify recipients that
way.)

Writing in that line or two of code would also take care of the school's
strange DoS concerns.

Of course, if profit is the goal, this isn't an option (though it doesn't look
like you're trying to monetize this... kudos for that).

Also, this removes the exclusivity from classhunt (as any student would see
this while browsing the schedule online). I'm sure you can make more
inferences about this option. (Less social capital in the business world as
you have nothing to point at and say "I did that," etc.)

