Find Occurences of Text String in Files

I had to search for occurences of string in particular files, and using Cygwin i did the following:

find all files that ends on .js, .html or .ascx:

$ find -regextype posix-extended -regex “.+\.(js|html?|ascx)$”

Then search for lines that contain the following javascript construction

for( var x in object)

I ended up with this regular expression to match the javascript:

/for\s?\(.+?\bin\b.+?\)/

. Although not strictly necessary, i test for word-boundaries around

“in”

— it would be sufficient just to use spaces like this

” in “

. For grep to eat this its pack with parameters that show filename (-H), line number (-n), only matching part of line (-o), and use Pearl compatible regexps (-P):

grep -nHoP “for\s?\(.+?\bin\b.+?\)”

Set this into a find expression where the -exec flag allows you to run the grep command on each file found:

$ find -regextype posix-extended -regex “.+\.(js|html?|ascx)$” -exec grep -nHoP “for\s?\(.+?\bin\b.+?\)” ‘{}’ \;

Choosing a -regextype for find
The regextype had to be changed and I found that these work with the chosen regexp:
* posix-extended, posix-awk, and posix-egrep
but posix-basic did not work.

4 Responses to “Find Occurences of Text String in Files”

  1. Thomas Baekdal Says:

    Jesper, you should try regexboddy :)
    http://www.regexbuddy.com/

  2. Jesper Rønn-Jensen Says:

    @Thomas, I am glad you pointed that out. It comes down to usability :)

    Actually i have a license for regexbuddy, acetext, Powergrep, and more of Jan Goyvaerts utilities.

    I was very fond of Powergrep especially because it had so good regex support. But I stopped using it — as the usability geek I am — primarily because of the following:

    1) the interface was cluttered and it became harder to use. I had repeatedly problems and wasted braincycles setting up simple things like which files to search in. The program had the unlucky ability always to present the wrong defaults when i used the program.

    2) the other thing that’s important to me is I want the search to be reproducible by everybody in our project team. Even if I could save a search, i don’t want it in a proprietary format.

    Having said that, I used the Powergrep family of programs extensively for a period of time. Approximately 2 years ago. So things can have improved.
    However I remember the shift from version 3 to 4 when the interface got too complicated (at least to me)

    I want programs that get out of my way, and let me focus on solving my work. Powergrep required too many of my braincycles :)

  3. Thomas Baekdal Says:

    He he – well, I pointed out RegExBoddy, primarely because I thought your command lines looked a tad too complex. I would have a hard time remembering what to write. I actually switched from a command like interface to RegExBoddy, because I thought it was more usable and easier to work with :)

    I guess then it is a matter of personal taste…

    BTW: All this remind of the age old quote by Jamie Zawinski

    Some people, when confronted with a problem, think
    “I know, I’ll use regular expressions.” Now they have two problems.