PaulDotCom mailing list archives

Re: Better RegEX to find IPs


From: Adrian Crenshaw <irongeek () irongeek com>
Date: Wed, 1 Dec 2010 19:52:33 -0500

Uhm, that's the one I'm already using.

Adrian

On Wed, Dec 1, 2010 at 7:47 PM, Adrian Crenshaw <irongeek () irongeek com>wrote:

Thanks, I plan to try it.
Adrian


On Wed, Dec 1, 2010 at 7:01 PM, Grymoire <pauldotcom () grymoire com> wrote:


Adrien, I never programmed in python before, so excuse the clumsy python
code.

But I did test this before I replied.
I used your original string as a start:
---------------------
#!/usr/bin/python
import re
f = open('/tmp/workfile', 'r')
TextBlob=f.read();
IPsInFile =
re.findall('(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)',TextBlob)
print IPsInFile
--------------------

I called this script findre1.py and it worked fine. Everything came
out in one line. with quotes around the addresses, and commas
between. So I am not sure why you are getting different results.

I then chenged the one line to be a simpler regex:

IPsInFile = re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',TextBlob)

This was the second version: findre2.py

It's hard comparing the results when everything is on one line.

I copied a log file into /tmp/workfile compared the two this way

# findre1.py | tr ',' '\,' | sort >a
# findre2.py | tr ',' '\,' | sort >b
# diff a b

and the difference was that the second version printed out one
additional value:

# diff a b
2992a2993
 '960.435.12.291'

So the first one worked fine, and found 2992 IP addresses. The second
one may give you invalid IP addresses.


- Grymoire
_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com



_______________________________________________
Pauldotcom mailing list
Pauldotcom () mail pauldotcom com
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com

Current thread: