Skip to content

Practice Exercise: Text Manipulation with sed and awk

Objective

Practice text manipulation techniques using the sed and awk commands in a Linux environment.

Task 1: Introduction to sed

  • Open a terminal window.
  • Let's use the vi_journal.txt file created earlier with the content.
    vi, a powerful and versatile text editor, is renowned for its efficient search and replace functionality. With vi, you can effortlessly locate specific words or phrases within your document using its robust search feature, and then replace them seamlessly with new content using the replace command. Whether you're editing code, configuring files, or crafting documents, vi's search and replace capabilities empower you to make precise and rapid changes, enhancing your productivity and control over your text editing tasks.
    
  • Use sed to replace a word or phrase in the vi_journal.txt file with another word. Save the result as modified.txt.
    [intern@intern-a1t-inf-lnx1 ~]$ sed 's/vi/emacs/g' vi_journal.txt > modified.txt
    [intern@intern-a1t-inf-lnx1 ~]$ cat modified.txt
    emacs, a powerful and versatile text editor, is renowned for its efficient search and replace functionality. With emacs, you can effortlessly locate specific words or phrases within your document using its robust search feature, and then replace them seamlessly with new content using the replace command. Whether you're editing code, configuring files, or crafting documents, emacs's search and replace capabilities empower you to make precise and rapid changes, enhancing your productiemacsty and control over your text editing tasks.
    

Task 2: Advanced sed Usage

  • Create a new text file named data.txt with several lines of data.
  • Use sed to perform the following tasks:
  • Delete specific lines containing a certain pattern from data.txt.
  • Replace text in data.txt using regular expressions.
  • Append new text to the end of lines in data.txt.
    [intern@intern-a1t-inf-lnx1 ~]$ cat data.txt
    This is line 1.
    This is line 2 with some patterns.
    This is line 3.
    Pattern ABC should be removed from this line.
    This is line 5 with another pattern.
    Replace XYZ with ZZZ in this line.
    This is line 7.
    [intern@intern-a1t-inf-lnx1 ~]$ sed '/ABC/d' data.txt > data2.txt
    [intern@intern-a1t-inf-lnx1 ~]$ cat data2.txt
    This is line 1.
    This is line 2 with some patterns.
    This is line 3.
    This is line 5 with another pattern.
    Replace XYZ with ZZZ in this line.
    This is line 7.
    [intern@intern-a1t-inf-lnx1 ~]$ sed 's/XYZ/ZZZ/g' data2.txt > data3.txt
    [intern@intern-a1t-inf-lnx1 ~]$ cat data3.txt
    This is line 1.
    This is line 2 with some patterns.
    This is line 3.
    This is line 5 with another pattern.
    Replace ZZZ with ZZZ in this line.
    This is line 7
    [intern@intern-a1t-inf-lnx1 ~]$ echo 'Appending a new line' | sed '$a\' >> data3.txt
    [intern@intern-a1t-inf-lnx1 ~]$ cat data3.txt
    This is line 1.
    This is line 2 with some patterns.
    This is line 3.
    This is line 5 with another pattern.
    Replace ZZZ with ZZZ in this line.
    This is line 7.
    Appending a new line
    

Task 3: Introduction to awk

  • Create a sample CSV file named sales.csv with some sample sales data (e.g., product, quantity, price).
  • Use awk to:
  • Calculate the total sales for each product.
  • Find the product with the highest sales.
    [intern@intern-a1t-inf-lnx1 ~]$ cat sales.csv
    Device,Quantity,Price
    Smartphone-A,10,599.99
    Tablet,12,399.99
    Smartphone-B,8,599.99
    Laptop,6,999.99
    Headphones-A,3,199.99
    Headphones-B,7,159.99
    [intern@intern-a1t-inf-lnx1 ~]$ awk -F',' 'NR > 1 {sales[$1] += $2 * $3} END {for (product in sales) print product, sales[product]}' sales.csv
    Tablet 4799.88
    Smartphone-A 5999.9
    Smartphone-B 4799.92
    Headphones-A 599.97
    Headphones-B 1119.93
    Laptop 5999.94
    
  • The command above might needs a little bit of explanation
  • The -F',' sets the delimiter to be , instead of the default space
  • The NR > 1 makes it so that we skip the header
  • The {sales[$1] += $2 * $3} multiplies the quantity and price then saves it to the sales array with the product as key
  • Basically sales[Tablet] += Quantity * Price
  • We used += incase that there are product that have been listed twice. But in this case since we don't have duplicate product in the data = will also do.
  • END means to only do the succeeding command when all the lines have been processed. In this casee loop through the array and print it
    [intern@intern-a1t-inf-lnx1 ~]$ awk -F ',' 'NR > 1 {sales[$1] += $2 * $3} END {max_sales = 0; max_product = "";
    for (product in sales) {
      if (sales[product] > max_sales) {
        max_sales = sales[product];
        max_product = product;
      }
    }
    print "Product with the highest sales:", max_product, "Total Sales:", max_sales
    }' sales.csv
    Product with the highest sales: Laptop Total Sales: 5999.94
    
  • The only difference of this command from the previous one is the last command. This times it loops through the array and store the max to a variable and compare it to the next one and store whichever is higher.

Task 4: Advanced awk Usage

  • Create a text file named grades.txt containing student names and their corresponding grades.
  • Use awk to:
  • Calculate the average grade.
  • Find students who scored below a certain grade.
  • Display the student with the highest grade.
    # Create the grades.txt file with student names and grades
    [intern@intern-a1t-inf-lnx1 ~]$ cat grades.txt
    Alice 92
    Bob 85
    Charlie 78
    David 95
    Eve 88
    Frank 72
    Grace 96
    Hank 64
    Ivy 90
    Jack 89
    
    # Calculate the average grade using awk
    [intern@intern-a1t-inf-lnx1 ~]$ awk '{ total += $2 } END { average = total / NR; print "Average Grade:", average }' grades.txt
    
    # Find students who scored below a certain grade (e.g., below 80)
    [intern@intern-a1t-inf-lnx1 ~]$ awk '$2 < 80 { print $1, "Scored Below 80" }' grades.txt
    
    # Display the student with the highest grade
    [intern@intern-a1t-inf-lnx1 ~]$ awk 'NR == 1 { max_grade = $2; top_student = $1 } $2 > max_grade { max_grade = $2; top_student = $1 } END { print "Top Student:", top_student, "Grade:", max_grade }' grades.txt
    

Conclusion

In this lab exercise, you've practiced text manipulation using the sed and awk commands in a Linux environment. These commands are powerful tools for processing and transforming text data. You've learned how to perform basic and advanced tasks such as find and replace, pattern matching, and data analysis. These skills are valuable for tasks like log file processing, data cleaning, and data analysis in a Linux environment.