regex - Extracting phone number issue in R -


having numbers this:

ll <- readlines(textconnection("(412) 573-7777 opt 1 563.785.1655 x1797 (567) 523-1534 x7753 (567) 483-2119 x 477 (451) 897-mall (342) 668-6255 ext 7 (317) 737-3377 opt 4 (239) 572-8878 x 3 233.785.1655 x1776 (138) 761-6877 x 4 (411) 446-6626 x 14 (412) 337-3332x19 412.393.3177 x24 327.961.1757 ext.4")) 

what regex should write get:

xxx-xxx-xxxx 

i tried one:

gsub('[(]([0-9]{3})[)] ([0-9]{3})[-]([0-9]{4}).*','\\1-\\2-\\3',ll) 

it doesn't cover possibilities. think can using several regex patterns, think can done using single regex.

if want extract numbers represented letters, can use following regex in gsub:

gsub('[(]?([0-9]{3})[)]?[. -]([a-z0-9]{3})[. -]([a-z0-9]{4}).*','\\1-\\2-\\3',ll) 

see ideone demo

you can remove a-z character classes match numbers no letters.

regex:

  • [(]? - optional (
  • ([0-9]{3}) - 3 digits
  • [)]? - optional )
  • [. -] - either dot, or space, or hyphen
  • ([a-z0-9]{3}) - 3 digit or letter sequence
  • [. -] - either dot, or space, or hyphen
  • ([a-z0-9]{4}) - 4 digit or letter sequence
  • .* - number of characters end

Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -