R4R Style Learning Portal

How to find duplicate characters in a string in java 8

The Java code provided efficiently identifies duplicate characters within a given string using the Stream API and a HashSet.
Here's an explanation of the code and its functionality:
 
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Collectors;

public class DuplicateCharactersInAString {
    public static void main(String[] args){
        String input = "java is programing language";
        Set<String> set = new HashSet<>(); // Stores unique characters encountered so far
        Set<String> duplicateCharacters = Arrays.stream(input.split("")) // Split string into a stream of individual characters
                .filter(ch -> !set.add(ch)) // Filter for characters that cannot be added to 'set' (i.e., duplicates)
                .collect(Collectors.toSet()); // Collect the filtered characters into a Set
        System.out.println(duplicateCharacters);
    }
}
Explanation
  1. Imports:
    • java.util.Arrays: Provides utility methods to manipulate arrays, including Arrays.stream() to convert an array into a stream.
    • java.util.HashSet: A class implementing the Set interface, which stores unique elements and does not allow duplicates.
    • java.util.Set: An interface representing a collection that cannot contain duplicate elements.
    • java.util.stream.Collectors: Provides various static methods for implementing reduction operations on streams, including Collectors.toSet() to collect elements into a Set.
  2. input.split(""):
    • The split("") method divides the input string into an array of individual characters, including spaces. Each element in the array is a String of length 1, representing a single character.
  3. Arrays.stream(...):
    • This converts the array of character strings into a stream, enabling the use of Stream API operations.
  4. .filter(ch -> !set.add(ch)):
    • This is the core of the duplicate detection logic.
    • set.add(ch): Attempts to add the current character (ch) to the set.
      • If ch is not already present in set, it's added successfully, and set.add(ch) returns true.
      • If ch is already present in set (meaning it's a duplicate), it's not added again, and set.add(ch) returns false.
    • !set.add(ch): The negation operator (!) reverses the boolean value.
      • If set.add(ch) is true (unique character), !true is false, so the character is filtered out.
      • If set.add(ch) is false (duplicate character), !false is true, so the character is included in the filtered stream.
    • Therefore, the filter operation keeps only the duplicate characters in the stream.
  5. .collect(Collectors.toSet()):
    • This gathers the filtered stream of duplicate characters and collects them into a Set. Using Collectors.toSet() automatically ensures that even if a character appears more than twice, it will only be stored once in the duplicateCharacters set, as sets only store unique elements.
  6. System.out.println(duplicateCharacters);:
    • This prints the Set containing all the unique duplicate characters found in the input string to the console.
 
Output
Given the input string "java is programing language", the code will produce the following output: 
 
[a, g, e]
Note: This output includes duplicate letters like 'a', 'g', and 'e'. The output does not include the space character, despite it being a duplicate, because it is also filtered out as the code specifically looks for duplicate characters within the split string where space is also considered a character. The duplicate characters in a string are those that appear more than once.