Monday, February 7, 2011

How do you parse a filename in bash?

I have a filename in a format like:

system-source-yyyymmdd.dat

I'd like to be able to parse out the different bits of the filename using the "-" as a delimiter.

  • Use the cut command.

    e.g. echo "system-source-yyyymmdd.dat" | cut -f1 -d'-'

    will extract the first bit.

    Change the value of the -f parameter to get the appropriate parts.

    Here's a guide on the Cut command.

    From David
  • You can use the cut command to get at each of the 3 'fields', e.g.:

    $ echo "system-source-yyyymmdd.dat" | cut -d'-' -f2
    source
    

    "-d" specifies the delimiter, "-f" specifies the number of the field you require

    Jon Ericson : I'm curious why you added the # prompt. Normally, that prompt indicates the root or superuser. In generally, I'd think stuff like trying out the **cut** command would be better done as a regular user. I'd have used the $ prompt.
    Bobby Jack : Oh, yeah - good point. I must admit, I was logged in as root at the time and simply went for it - a bad habit, I know. Having said that, I think echo and cut are two of the least harmful commands :) But, for the sake of completeness, I'll certainly update the example right away. Cheers.
    From Bobby Jack
  • Fantastic - an answer in 3 minutes - that's quicker than phoning a friend!

  • Depending on your needs, awk is more flexible than cut. A first teaser:

    # echo "system-source-yyyymmdd.dat" \
        |awk -F- '{printf "System: %s\nSource: %s\nYear: %s\nMonth: %s\nDay: %s\n",
                  $1,$2,substr($3,1,4),substr($3,5,2),substr($3,7,2)}'
    System: system
    Source: source
    Year: yyyy
    Month: mm
    Day: dd
    

    Problem is that describing awk as 'more flexible' is certainly like calling the iPhone an enhanced cell phone ;-)

    From flight
  • Another method is to use the shell's internal parsing tools, which avoids the cost of creating child processes:

    oIFS=$IFS
    IFS=-
    file="system-source-yyyymmdd.dat"
    set $file
    IFS=$oIFS
    echo "Source is $2"
    
  • A nice and elegant (in my mind :-) using only built-ins is to put it into an array

    var='system-source-yyyymmdd.dat'
    parts=(${var//-/ })
    

    Then, you can find the parts in the array...

    echo ${parts[0]}  ==> system
    echo ${parts[1]}  ==> source
    echo ${parts[2]}  ==> yyyymmdd.dat
    

    Caveat: this will not work if the filename contains "strange" characters such as space, or, heaven forbids, quotes, backquotes...

0 comments:

Post a Comment