Practice Exercise: Text Manipulation with sed and awk
Objective
Practice text manipulation techniques using the sed
and awk
commands in a Linux environment.
Task 1: Introduction to sed
- Open a terminal window.
- Let's use the
vi_journal.txt
file created earlier with the content.vi, a powerful and versatile text editor, is renowned for its efficient search and replace functionality. With vi, you can effortlessly locate specific words or phrases within your document using its robust search feature, and then replace them seamlessly with new content using the replace command. Whether you're editing code, configuring files, or crafting documents, vi's search and replace capabilities empower you to make precise and rapid changes, enhancing your productivity and control over your text editing tasks.
- Use
sed
to replace a word or phrase in thevi_journal.txt
file with another word. Save the result asmodified.txt
.[intern@intern-a1t-inf-lnx1 ~]$ sed 's/vi/emacs/g' vi_journal.txt > modified.txt [intern@intern-a1t-inf-lnx1 ~]$ cat modified.txt emacs, a powerful and versatile text editor, is renowned for its efficient search and replace functionality. With emacs, you can effortlessly locate specific words or phrases within your document using its robust search feature, and then replace them seamlessly with new content using the replace command. Whether you're editing code, configuring files, or crafting documents, emacs's search and replace capabilities empower you to make precise and rapid changes, enhancing your productiemacsty and control over your text editing tasks.
Task 2: Advanced sed
Usage
- Create a new text file named
data.txt
with several lines of data. - Use
sed
to perform the following tasks: - Delete specific lines containing a certain pattern from
data.txt
. - Replace text in
data.txt
using regular expressions. - Append new text to the end of lines in
data.txt
.[intern@intern-a1t-inf-lnx1 ~]$ cat data.txt This is line 1. This is line 2 with some patterns. This is line 3. Pattern ABC should be removed from this line. This is line 5 with another pattern. Replace XYZ with ZZZ in this line. This is line 7. [intern@intern-a1t-inf-lnx1 ~]$ sed '/ABC/d' data.txt > data2.txt [intern@intern-a1t-inf-lnx1 ~]$ cat data2.txt This is line 1. This is line 2 with some patterns. This is line 3. This is line 5 with another pattern. Replace XYZ with ZZZ in this line. This is line 7. [intern@intern-a1t-inf-lnx1 ~]$ sed 's/XYZ/ZZZ/g' data2.txt > data3.txt [intern@intern-a1t-inf-lnx1 ~]$ cat data3.txt This is line 1. This is line 2 with some patterns. This is line 3. This is line 5 with another pattern. Replace ZZZ with ZZZ in this line. This is line 7 [intern@intern-a1t-inf-lnx1 ~]$ echo 'Appending a new line' | sed '$a\' >> data3.txt [intern@intern-a1t-inf-lnx1 ~]$ cat data3.txt This is line 1. This is line 2 with some patterns. This is line 3. This is line 5 with another pattern. Replace ZZZ with ZZZ in this line. This is line 7. Appending a new line
Task 3: Introduction to awk
- Create a sample CSV file named
sales.csv
with some sample sales data (e.g., product, quantity, price). - Use
awk
to: - Calculate the total sales for each product.
- Find the product with the highest sales.
[intern@intern-a1t-inf-lnx1 ~]$ cat sales.csv Device,Quantity,Price Smartphone-A,10,599.99 Tablet,12,399.99 Smartphone-B,8,599.99 Laptop,6,999.99 Headphones-A,3,199.99 Headphones-B,7,159.99 [intern@intern-a1t-inf-lnx1 ~]$ awk -F',' 'NR > 1 {sales[$1] += $2 * $3} END {for (product in sales) print product, sales[product]}' sales.csv Tablet 4799.88 Smartphone-A 5999.9 Smartphone-B 4799.92 Headphones-A 599.97 Headphones-B 1119.93 Laptop 5999.94
- The command above might needs a little bit of explanation
- The
-F','
sets the delimiter to be,
instead of the default space - The
NR > 1
makes it so that we skip the header - The
{sales[$1] += $2 * $3}
multiplies the quantity and price then saves it to the sales array with the product as key - Basically sales[Tablet] += Quantity * Price
- We used
+=
incase that there are product that have been listed twice. But in this case since we don't have duplicate product in the data=
will also do. - END means to only do the succeeding command when all the lines have been processed. In this casee loop through the array and print it
[intern@intern-a1t-inf-lnx1 ~]$ awk -F ',' 'NR > 1 {sales[$1] += $2 * $3} END {max_sales = 0; max_product = ""; for (product in sales) { if (sales[product] > max_sales) { max_sales = sales[product]; max_product = product; } } print "Product with the highest sales:", max_product, "Total Sales:", max_sales }' sales.csv Product with the highest sales: Laptop Total Sales: 5999.94
- The only difference of this command from the previous one is the last command. This times it loops through the array and store the
max
to a variable and compare it to the next one and store whichever is higher.
Task 4: Advanced awk
Usage
- Create a text file named
grades.txt
containing student names and their corresponding grades. - Use
awk
to: - Calculate the average grade.
- Find students who scored below a certain grade.
- Display the student with the highest grade.
# Create the grades.txt file with student names and grades [intern@intern-a1t-inf-lnx1 ~]$ cat grades.txt Alice 92 Bob 85 Charlie 78 David 95 Eve 88 Frank 72 Grace 96 Hank 64 Ivy 90 Jack 89 # Calculate the average grade using awk [intern@intern-a1t-inf-lnx1 ~]$ awk '{ total += $2 } END { average = total / NR; print "Average Grade:", average }' grades.txt # Find students who scored below a certain grade (e.g., below 80) [intern@intern-a1t-inf-lnx1 ~]$ awk '$2 < 80 { print $1, "Scored Below 80" }' grades.txt # Display the student with the highest grade [intern@intern-a1t-inf-lnx1 ~]$ awk 'NR == 1 { max_grade = $2; top_student = $1 } $2 > max_grade { max_grade = $2; top_student = $1 } END { print "Top Student:", top_student, "Grade:", max_grade }' grades.txt
Conclusion
In this lab exercise, you've practiced text manipulation using the sed
and awk
commands in a Linux environment. These commands are powerful tools for processing and transforming text data. You've learned how to perform basic and advanced tasks such as find and replace, pattern matching, and data analysis. These skills are valuable for tasks like log file processing, data cleaning, and data analysis in a Linux environment.