Sunday 6 November 2016

Linux: How to find duplicate lines count in a file from terminal.

Linux has many commands that are useful to process/analyze a file. In this post I would just explain a simple utility that would just print out the number of times each line is repeated in that file.

So here is the command:

terminal$ sort yourfilename.txt | uniq -c

Here yourfilename.txt can be any file name which I used here for an example.
Suppose the contents of yourfilename.txt be

line1
line1
line2
line3
line1
line3

Output:

3 line1
2 line3
1 line2

Explanation:

The sort command is quite self explanatory over here its output is piped/redirected to uniq. Uniq command requires its input to  be sorted(keep in mind always hard to remember). Uniq -c just prints the count of each line.




No comments:

Post a Comment