regex - Extracting phone number issue in R -
having numbers this:
ll <- readlines(textconnection("(412) 573-7777 opt 1 563.785.1655 x1797 (567) 523-1534 x7753 (567) 483-2119 x 477 (451) 897-mall (342) 668-6255 ext 7 (317) 737-3377 opt 4 (239) 572-8878 x 3 233.785.1655 x1776 (138) 761-6877 x 4 (411) 446-6626 x 14 (412) 337-3332x19 412.393.3177 x24 327.961.1757 ext.4"))
what regex should write get:
xxx-xxx-xxxx
i tried one:
gsub('[(]([0-9]{3})[)] ([0-9]{3})[-]([0-9]{4}).*','\\1-\\2-\\3',ll)
it doesn't cover possibilities. think can using several regex patterns, think can done using single regex.
if want extract numbers represented letters, can use following regex in gsub
:
gsub('[(]?([0-9]{3})[)]?[. -]([a-z0-9]{3})[. -]([a-z0-9]{4}).*','\\1-\\2-\\3',ll)
see ideone demo
you can remove a-z
character classes match numbers no letters.
regex:
[(]?
- optional(
([0-9]{3})
- 3 digits[)]?
- optional)
[. -]
- either dot, or space, or hyphen([a-z0-9]{3})
- 3 digit or letter sequence[. -]
- either dot, or space, or hyphen([a-z0-9]{4})
- 4 digit or letter sequence.*
- number of characters end
Comments
Post a Comment