
Extend robots.txt to specify session variable names? - foxylad
To cope with visitors who block cookies, we use a query variable in our URLs to maintain sessions. In our case we use &quot;z&quot;, because it is short and unlikely to be used for anything else. So our URLs look like www.example.com&#x2F;contact?z=123456.<p>However, this confuses most bots because they assume the query variable indicates a distinct resource, fairly enough. So each time a bot visits, it gets a new session variable - and ends up crawling tens if not hundreds of copies of the same page. I&#x27;m aware that some bots (Google for example) allow you to specify the name of your session variable, but there are now thousands of bots and informing each one of your session variable name seems impractical.<p>So would it be a good idea to extend robots.txt to allow specification of query variables that do not indicate distinct resources? &quot;Ignore-query-var: z&quot; perhaps?
======
pastyboy
Test for bots (useragent) before adding the variable, requires array of most
common bots - if bot don't create a session ...

~~~
foxylad
Maintaining a list of bot useragents would be a pain, but I guess testing if
the useragent contains 'bot' would work for most.

