The Mysterious Case of the Regex Dot

So, I’m in the middle of organizing my photos into folders, something more useable than the default Photos application on Mac[1].

While trying to count the number of photos/videos[2] in each subdirectory in my …/2018/ folder:

$ time find * |grep IMG|grep -o ‘^[0-9][0-9]/.’|uniq -c
22 04/0
3297 05/1
104 05/2
100 06/0
1830 06/2
2040 10/2

I first tried the supposedly logical:

$ time find * |grep IMG|grep -o ^..|uniq -c|head
1 04
1 /0
1 1/
1 20
1 18
1 04
1 01
1 -0
1 00
1 41

Interestingly, grep (and/or the OS) seemed to be taking the front off of each line, and then putting it back into the STDIN hopper for the next call to grep.

As this was not doing what I expected (nor wanted), I tried:

$ time find * |grep IMG|grep -o ‘^[0-9][0-9]/’|uniq -c|head
1 04/
1 01/
1 04/
1 01/
1 04/
1 01/
1 04/
1 01/
1 04/
1 01/

Which, while better…

$ time find * |grep IMG|grep -o ‘^[0-9][0-9]/’|uniq -c|sort|uniq -c
22 1 01/
22 1 04/
3501 1 05/
1930 1 06/
2040 1 10/
3297 1 13/
104 1 23/
1830 1 24/
2040 1 27/

…gave me too many results by about a factor of two, and somehow found 27 months in the year.

I quickly figured out that while parsing mm/dd/yyyymmdd-hash/IMG_[0-9][0-9][0-9][0-9].[FILETYPE], this particular grep/OS combination will happily grab the ‘mm/’, and then also grab the ‘dd/’. This habit, while charming, does not solve my problem.

After google searching https://www.google.com/search?q=grep+one+match+per+line proved unfruitful, I decided to try:

$ time find * |grep IMG|grep -o ‘^[0-9][0-9]/.’|uniq -c
22 04/0
3297 05/1
104 05/2
100 06/0
1830 06/2
2040 10/2

and it worked!

I was stumped, until I figured out that the issues that I had been seeing before were entirely because grep was finding results at the start of the newly chomped string, and that by chomping part of the next ‘match’, I was stopping grep from finding any more matches.

#themoreyouknow

[1] Right now, when Photos organizes photos, it puts each photo into its own folder, based on year/month/day/yyyymmdd-hash, which makes it super-annoying to use anything about the Photos app, which is super-slow and annoying to use.

[2] The images are all in the format ‘IMG_[0-9][0-9][0-9][0-9].[FILETYPE]’, where FILETYPE can be ‘PNG’ (screenshots), ‘JPG’ (camera pictures), ‘MOV’ (camera movies), ‘GIF’ (saved .gifs), or perhaps some other recognized image format.

Leave a Reply

Your email address will not be published. Required fields are marked *