*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined. the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences - this is wasteful pattern design.The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured - only matched/absorbed.\b (word boundary) characters are vital to ensure partial words are not matched.This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc). Replace: $1 (replaces the fullstring match with capture group #1) Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern. Replace method shall replace all consecutive matched words with the first instance of the word. M.group(1) : Shall contain the first word of the matched pattern in above case Goodbye M.group(0) : Shall contain the matched group in above case Goodbye goodbye GooDbYe Whole thing wrapped in * helps to find more than one repetitions. (\s+\1\b)* : Any number of space followed by word which matches the previous word and ends the word boundary. Input = input.replaceAll(m.group(0), m.group(1)) Check for subsequences of input that match the compiled pattern Pattern p = pile(regex, Pattern.CASE_INSENSITIVE) The below expression should work correctly to find any number of duplicated words.