Nmap Development mailing list archives

Re: html page extensions


From: jah <jah () zadkiel plus com>
Date: Mon, 14 Sep 2009 11:30:47 +0100

On 14/09/2009 04:48, Patrick Donnelly wrote:
Hi nmap-dev,

I'm working on an http spider script and need to know what file
extensions are common for html pages. Here's a list I have so far (in
Lua regular expressions):

 local html_page_extensions = {
   "%.html$", -- regular html page
   "%.htm$", -- regular html page
   "%.shtml$", -- regular html page
   "%.phtml$", -- regular html page
   "%.php$", -- php
   "%.pl$", -- perl
   "%.cgi$", -- cgi
   "%.jsp$", -- Java Server Pages
   "%.asp$", -- Active Server Pages (Microsoft)
 };

I'm also checking pages that have no extension (as that is apparently
very common). Does anyone have more to add?

  
.xht, xhtml, .htmls, .hta, .cfml, .adp, .aht, .ahtm, .ahtml, .mht, mhtm,
.mhtml, .jht, .jhtm, .jhtml
You could probably write a script that performs a
filetype:<some_random_extension> google search, HEAD request the first
result and check for content-type=text/html.  You'll likely end-up with
a list as long as your arm.
.php3 .php4 .php5 are common (frequently used to distinguish between
versions of php), but php followed by 'some number' is likley.  I did a
few google searches for "filetype:phpN" for different values of N:

0 - 542
1 - 22500
2 - 1610
3 - 57000000
4 - 5190000
5 - 3040000
6 - 6490
7 - 245
8 - 143
9 - 568
10 - 56
...
29 - 23
30 - 36
...
200 - 3
...
9999 - 1
10000 - 0

So it seems that for any non-negative integer below 10000 there's a
possibility that the filetype is in use.  (The single result at 9999 is
a bit of an oddity since the 9999 is a parameter to the php script).

jah

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: