BekoC: Algorithms: String based algorithm questions

in this post, I try to implement the following string-related algorithm questions, note that as i mentioned before these questions are taken from Arden's blog, he implemented questions by using python (Please check his blog for more detailed explanations) I would like to implement using Java and try to make additional comments which I think that are useful for understanding the implementations of some type of data structures in Java programming language.

Non Repeated Characters in String : return the unique characters in a given letter.
Anagram Strings: given two strings, check if they’re anagrams or not. Two strings are anagrams if they are written using the same exact letters, ignoring space, punctuation and capitalization. Each letter should have the same count in both strings. For example, ‘Eleven plus two’ and ‘Twelve plus one’ are meaningful anagrams of each other.
Find Word Positions in Text : Given a text file and a word, find the positions that the word occurs in the file. We’ll be asked to find the positions of many words in the same file.
Remove Duplicate Characters in String: Remove duplicate characters in a given string keeping only the first occurrences. For example, if the input is ‘tree traversal’ the output will be ‘tre avsl’.

In order to understand these types of questions, it requires to store unique key and count of values each, especially basic understanding of hashmap is essential. Please check here before you start reviewing the codes.

Ok. Let's start.

1- Non Repeated Characters in String : return unique characters in a given String

In this question, we use hashmap and complexity is :

$O(n) where~n ~is ~the ~length ~of~ a ~given ~word.$

2- Anagram Strings: given two strings, check if they’re anagrams or not. Two strings are anagrams if they are written using the same exact letters, ignoring space, punctuation and capitalization. Each letter should have the same count in both strings. For example, ‘Eleven plus two’ and ‘Twelve plus one’ are meaningful anagrams of each other.

Two implementations have coded, basic knowledge of Multiset is required to understand the second implementation which reduces the complexity to O(n).

To test:

boolean result = isAnagram("Eleven plus two", "Twelve plus one");

boolean result2 = isAnagramMultiset("ad", "daas");

3- Find Word Positions in Text : Given a text file and a word, find the positions that the word occurs in the file. We’ll be asked to find the positions of many words in the same file.

For this questions, we use a text instead of file to find the given word's position. We create a map and to store the values, arraylist is used. For a given key, if there are more than one appearance of a value, positions are appended to the arraylist.

Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();

we can test by :

findWordPosition("have a nice try did you try try try", "try");

4-Remove Duplicate Characters in String: Remove duplicate characters in a given string keeping only the first occurrences. For example, if the input is ‘tree traversal’ the output will be ‘tre avsl’.

This question best fits with use of sets, you can check the cheatsheet of java collections to make decision which collection is better for a given problem.

 Set<Character> set = new LinkedHashSet<Character>();

 Set<Character> set = removeDuplicateChars("tree traversal");
        for (Character entry : set) {
            System.out.println(entry);
        }

output of problem 3 and 4

Implementations of 4 questions

/*
 *@author beck 
 *@date Jul 27, 2012 
 *StringAlgorithms 
 *bekoc.blogspot.com 
 */
package stringalgorithms;

import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
import java.util.*;

public class StringAlgorithms {

    public static void main(String[] args) {

        //1-Non Repeated Characters in String :  return the  unique  characters in a given letter.
        ArrayList<String> uniqueCharsResults = uniqueLetters("teesstedbyyourself");
        for (int i = 0; i < uniqueCharsResults.size(); i++) {
            System.out.print(uniqueCharsResults.get(i) + " ");
        }

        //2- Anagram Strings: given two strings, check if they’re anagrams or not.
        //Two strings are anagrams if they are written using the same exact letters, ignoring space, punctuation and capitalization. 
        //Each letter should have the same count in both strings. 
        //For example, ‘Eleven plus two’ and ‘Twelve plus one’ are meaningful anagrams of each other.
        //complexity is logn
        boolean result = isAnagram("Eleven plus two", "Twelve plus one");
        System.out.println("Anagram Test using sorting, Result is: " + result);
        //to reduce the complexity linear time, n, we can use hashtable
        //first store Text1 in a hastable 
        //(key, value pairs where key is the each character in Text1 
        //and value is the count of the characters) and then go thru Text2 one by one
        //character and decrease the count of the text1, if all counts of text 1 is 0 then
        //can be called as anagram Strings.
        boolean result2 = isAnagramMultiset("ad", "daas");
        System.out.println("Anagram Test using Multiset, Result is: " + result2);

        //3- Find Word Positions in Text : Given a text file 
        //and a word, find the positions that the word 
        //occurs in the file. We’ll be asked to find the positions of 
        //many words in the same file.
        findWordPosition("have a nice try did you try try try", "try");

        //4-Remove Duplicate Characters in String: Remove duplicate characters 
        //in a given string keeping only the first occurrences. For example, 
        //if the input is ‘tree traversal’ the output will be ‘tre avsl’.
        Set<Character> set = removeDuplicateChars("tree traversal");
        for (Character entry : set) {
            System.out.println(entry);
        }

    }

    public static ArrayList<String> uniqueLetters(String text) {
        //create a hashmap with key and value pairs, where key is the each letter and
        //value is the count of each letter of a given text
        Map<Character, Integer> map = new HashMap<Character, Integer>();
        //to store unique letters create an arraylist
        ArrayList<String> uniqueChars = new ArrayList<String>();
        //we check each letter in a given text, so in this problem we don't need StringTokenizer
        //StringTokenizer tokenizer = new StringTokenizer(text);
        for (int i = 0; i < text.length(); ++i) {
            if (map.containsKey(text.charAt(i))) {
                int count = map.get(text.charAt(i));
                map.put(text.charAt(i), count + 1);
            } else {
                map.put(text.charAt(i), 1);
            }
        }
        // Loop in a hashSet to get keys and values
        Iterator iterator = map.entrySet().iterator();
        while (iterator.hasNext()) {
            Map.Entry pairs = (Map.Entry) iterator.next();
            System.out.println(pairs.getKey() + " = " + pairs.getValue() + " times appeared");
            if (pairs.getValue().equals(1)) {
                uniqueChars.add(pairs.getKey().toString());
            }
            iterator.remove(); // avoids a ConcurrentModificationException
        }
        return uniqueChars;
    }

    public static boolean isAnagram(String text1, String text2) {
        //1st  method is to sort both strings and check one by one
        //sorting adds an complexity of nlogn (e.g. merge sort)
        //this code considers space, punctuation etc.
        //getChars method  may also be used here
        System.out.println("");
        if (text1.length() != text2.length()) {
            System.out.println("inside isAnagram: " + text1.length() + " " + text2.length());
            return false;
        } else {
            String sorted1 = convertToCharArrayFromString(1, text1);// 1 is for sorted return 
            String sorted2 = convertToCharArrayFromString(1, text2);
            for (int i = 0; i < sorted1.length(); i++) {
                if (sorted1.charAt(i) != sorted2.charAt(i)) {
                    return false;
                }
            }
        }
        return true;
    }

    public static String convertToCharArrayFromString(int option, String text) {
        char[] content1 = text.toLowerCase().toCharArray();
        if (option == 1) {
            Arrays.sort(content1);
            String sorted1 = new String(content1);
            return sorted1;
        } else {
            String unsorted1 = new String(content1);
            return unsorted1;
        }
    }

    public static boolean isAnagramMultiset(String text1, String text2) {
        //requires basic knowledge of Multiset from guova libraries
        //may be implemented using pure JDK specifications
        if (text1.length() != text2.length()) {
            return false;
        }
        Multiset<Character> chars = HashMultiset.create();
        String unsorted1 = convertToCharArrayFromString(0, text1);
        String unsorted2 = convertToCharArrayFromString(0, text2);
        for (int i = 0; i < unsorted1.length(); i++) {
            chars.add(unsorted1.charAt(i));
        }
        for (int i = 0; i < unsorted1.length(); i++) {
            chars.remove(unsorted2.charAt(i), 1);
            System.out.println("Multiset " + chars.count(unsorted2.charAt(i)));
            if (chars.count(unsorted2.charAt(i)) < 0) {
                return false;
            }
        }
        return true;
    }

    public static void findWordPosition(String text, String word) {
        Map<String, ArrayList<Integer>> map = createInvertedIndex(text);
        // for (Map.Entry<String, ArrayList<Integer>> entry : map.entrySet()) {
        ArrayList<Integer> position = map.get(word);
        // System.out.println("Key " + entry.getKey() + entry.getValue());
        for (Integer value : position) {
            System.out.println("Key: " + word + " its positions " + value);
        }
    }

    public static Map<String, ArrayList<Integer>> createInvertedIndex(String text) {
        //assume that text is given with appropriate
        //letters so no need for regular expressions
        //we can create a map with a key of String (each word) and
        //their positions that is integer array
        Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();
        String[] words = text.split(" ");
        for (int i = 0; i < words.length; i++) {
            if (map.containsKey(words[i])) {
                ArrayList<Integer> position = map.get(words[i]);
                position.add(i);
                map.put(words[i], position);
            } else {
                ArrayList<Integer> position = new ArrayList<Integer>();
                position.add(i);
                map.put(words[i], position);
            }
        }
        return map;
    }

    public static Set<Character> removeDuplicateChars(String text) {
        // for insertion sort order LinkedHashSet is used
        Set<Character> set = new LinkedHashSet<Character>();
        for (int i = 0; i < text.length(); i++) {
            if (!set.contains(text.charAt(i))) {
                set.add(text.charAt(i));
            }
        }
        return set;
    }
}

BekoC: Algorithms

Pages

Friday, July 27, 2012

String based algorithm questions

1- Non Repeated Characters in String : return unique characters in a given String

Implementations of 4 questions

No comments:

Post a Comment