Java 8 Streams


Bhaskar S 10/24/2014


A typical Java application will invariably use some form of Collection class to iterate over a sequence of data elements and perform operation(s) on each of them.

Imagine we had a Collection with a large number of data elements. Iterating over them and processing them one by one would be very inefficient. A more efficient approach would be to slice up the large Collection into smaller chunks and process each chunk in a separate thread. Writing multi-threaded code is complex and error-prone.

What if Java provided an out-of-the-box capability to automatically iterate over and process a large Collection of data elements in parallel without one having to write a single line of multi-threaded code ???

This is exactly what Java 8 Streams does !!!

In other words, Streams is a new capability in Java 8 that can perform operation(s) on a Collection of data elements, either in sequential or in parallel mode.

Without further ado, lets jump right into some examples to illustrate the power of Streams.

The following is the simple program that filters names beginning with the letter "j" using the traditional Java language constructs:

Listing.1
/*
 * 
 * Name:   StreamsTest
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/24/2014
 * 
 */

package com.polarsparc.java8;

import java.util.List;
import java.util.Arrays;

public class StreamsTest {
    public static void main(String[] args) {
        List<String> names = Arrays.asList(
            "alice",
            "bob",
            "charlie",
            "joe",
            "john"
          );
        
        for (String n : names) {
            System.out.println("<1> Checking n = " + n);
            if (n.startsWith("j")) {
                  System.out.println("<1> Starts with j: " + n);
            }
        }
    }
}

Executing the program from Listing.1 will generate the following output:

Output.1

<1> Checking n = alice
<1> Checking n = bob
<1> Checking n = charlie
<1> Checking n = joe
<1> Starts with j: joe
<1> Checking n = john
<1> Starts with j: john

The following is same simple program that filters names beginning with the letter "j" using Java 8 Streams:

Listing.2
/*
 * 
 * Name:   StreamsTest2
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/24/2014
 * 
 */

package com.polarsparc.java8;

import java.util.Arrays;

public class StreamsTest2 {
    public static void main(String[] args) {
        Arrays.stream(new String[] {
            "alice",
            "bob",
            "charlie",
            "joe",
            "john"
            })
            .filter(n -> {
                System.out.println("<2> Checking n = " + n);
                return n.startsWith("j");
             })
            .forEach(n -> System.out.println("<2> Starts with j: " + n));
    }
}

Executing the program from Listing.2 will generate the following output:

Output.2

<2> Checking n = alice
<2> Checking n = bob
<2> Checking n = charlie
<2> Checking n = joe
<2> Starts with j: joe
<2> Checking n = john
<2> Starts with j: john

The code in Listing.2 needs a little explanation.

The code in Listing.2 is far more elegant compared to Listing.1.

The following is a simple program which demonstrates the various operations on a stream of numbers such as counting, finding the distinct numbers, finding the sum, average, min, and max:

Listing.3
/*
 * 
 * Name:   StreamsTest3
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/24/2014
 * 
 */

package com.polarsparc.java8;

import java.util.List;
import java.util.Arrays;
import java.util.ArrayList;

public class StreamsTest3 {
    public static void main(String[] args) {
        List<Integer> numbers1 = Arrays.asList(
                2, 4, 6, 8,
                8, 8, 6, 4,
                1, 3, 5, 7,
                7, 7, 9, 3
          );

        List<Integer> numbers2 = new ArrayList<>();

        System.out.println("<1> Count of all numbers: " + numbers1.stream().count());

        System.out.println("<2> Count of all numbers: " + numbers2.stream().count());

        System.out.println("<1> Count of distinct numbers: " + numbers1.stream().distinct().count());

        System.out.println("<2> Count of distinct numbers: " + numbers2.stream().distinct().count());

        System.out.println("<1> Sum of all numbers: " + numbers1.stream().mapToInt(Integer::intValue).sum());

        System.out.println("<2> Sum of all numbers: " + numbers2.stream().mapToInt(Integer::intValue).sum());

        numbers1.stream()
            .mapToInt(Integer::intValue)
            .average()
            .ifPresent(x -> System.out.println("<1> Average number: " + x));

        numbers2.stream()
            .mapToInt(Integer::intValue)
            .average()
            .ifPresent(x -> System.out.println("<2> Average number: " + x));

        numbers1.stream()
           .min(Integer::compare)
           .ifPresent(x -> System.out.println("<1> Min number: " + x));

        numbers2.stream()
           .min(Integer::compare)
           .ifPresent(x -> System.out.println("<2> Min number: " + x));

        numbers1.stream()
           .max(Integer::compare)
           .ifPresent(x -> System.out.println("<1> Max number: " + x));

        numbers2.stream()
           .max(Integer::compare)
           .ifPresent(x -> System.out.println("<2> Max number: " + x));
   }
}

Executing the program from Listing.3 will generate the following output:

Output.3

<1> Count of all numbers: 16
<2> Count of all numbers: 0
<1> Count of distinct numbers: 9
<2> Count of distinct numbers: 0
<1> Sum of all numbers: 88
<2> Sum of all numbers: 0
<1> Average number: 5.5
<1> Min number: 1
<1> Max number: 9

The code in Listing.3 needs a little explanation.

Continuing our journey on Streams, the following is a simple program that demonstrates the use of operations such as map, flatMap, sorted, and collect on streams:

Listing.4
/*
 * 
 * Name:   StreamsTest4
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

import java.util.List;
import java.util.Arrays;
import java.util.stream.Stream;
import java.util.stream.Collectors;

public class StreamsTest4 {
    public static void main(String[] args) {
        List<Integer> numbers1 = Arrays.asList(
            2, 4, 6, 8,
            8, 8, 6, 4
        );
        
        List<Integer> numbers2 = Arrays.asList(
            1, 3, 5, 7,
            7, 7, 9, 3
        );
        
        List<Integer> nums = numbers1.stream()
            .map(n -> 2*n*n + 1)
            .collect(Collectors.toList());
        System.out.println(nums);

        String s1 = Stream.of(numbers2)
            .sorted()
            .map(Object::toString)
            .collect(Collectors.joining(", "));
        System.out.println(s1);
       
        String s2 = Stream.of(numbers1, numbers2)
            .map(Object::toString)
            .collect(Collectors.joining(" :: "));
        System.out.println(s2);
       
        String s3 = Stream.of(numbers1, numbers2)
            .flatMap(nlist -> nlist.stream())
            .map(Object::toString)
            .collect(Collectors.joining(", "));
        System.out.println(s3);
   }
}

Executing the program from Listing.4 will generate the following output:

Output.4

[9, 33, 73, 129, 129, 129, 73, 33]
[1, 3, 5, 7, 7, 7, 9, 3]
[2, 4, 6, 8, 8, 8, 6, 4] :: [1, 3, 5, 7, 7, 7, 9, 3]
2, 4, 6, 8, 8, 8, 6, 4, 1, 3, 5, 7, 7, 7, 9, 3

The code in Listing.4 needs a little explanation.

Until now, all our examples only used primitive types. In this example we will demonstrate the use of Streams with a collection of user defined objects.

The following is a simple grade object that encapsulates a subject and the corresponding grade:

Listing.5
/*
 * 
 * Name:   Grade
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

public class Grade {
    private String subject;
    private double grade;
    
    public Grade(String s, double g) {
        this.subject = s;
        this.grade = g;
    }
    
    public String getSubject() {
        return subject;
    }

    public double getGrade() {
        return grade;
    }
    
    @Override
    public String toString() {
        return this.subject + " <" + this.grade + ">";
    }
}

The following is a simple program that finds the average grade and finds the best grade:

Listing.6
/*
 * 
 * Name:   StreamsTest5
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

import java.util.List;
import java.util.Arrays;

public class StreamsTest5 {
    public static void main(String[] args) {
        List<Grade> grades = Arrays.asList(
            new Grade("Math", 85.5),
            new Grade("Science", 78.0),
            new Grade("History", 67.0),
            new Grade("English", 72.5),
            new Grade("Spanish", 66.0)
        );
        
        grades.stream()
            .mapToDouble(Grade::getGrade)
            .average()
            .ifPresent(x -> System.out.println("Average grade: " + x));
        
        grades.stream()
            .reduce((g1, g2) -> g1.getGrade() > g2.getGrade() ? g1 : g2)
            .ifPresent(g -> System.out.println("Best grade - " + g));
   }
}

Executing the program from Listing.6 will generate the following output:

Output.5

Average grade: 73.8
Best grade - Math <85.5>

The code in Listing.6 needs a little explanation.

Now for the most exciting part. We introduced Streams as a way to perform operation(s) on a Collection of data elements, either in sequential or in parallel mode.

Up until now, all our examples were executed in a sequential fashion. In this example, we will demonstrate the use of Streams in both sequential and parallel mode without writing a single line of multi-threaded code.

Excited ???

The following is a simple object that encapsulates a number and its corresponding factors:

Listing.7
/*
 * 
 * Name:   StreamsTest6
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

import java.util.List;
import java.util.ArrayList;

public class MyFactors {
    private long num;
    private List<Long> factors;
    
    public MyFactors() {
        num = 0;
        factors = new ArrayList<>();
    }
    
    public long getNum() {
        return num;
    }
    
    public void setNum(long num) {
        this.num = num;
    }
    
    public void add(long n) {
        this.factors.add(n);
    }
    
    public List<Long> getFactors() {
        return this.factors;
    }
    
    @Override
    public String toString() {
        return "Num: " + num + ", Factors: " + factors.toString();
    }
}

The following is a simple program that finds the factors for all the numbers in a list in sequential mode:

Listing.8
/*
 * 
 * Name:   StreamsTest6
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

import java.util.Arrays;
import java.util.stream.LongStream;

public class StreamsTest6 {
    public static void main(String[] args) {
        LongStream numbers = Arrays.stream(new long[] {
            976534, 1267543, 1543087, 845987, 1720728,
            590630, 1935609, 1390765, 320609, 1123001
        });
        
        long start = System.nanoTime();
        
        numbers
            .mapToObj(StreamsTest6::findFactors)
            .forEach(fac -> System.out.println(fac));
        
        long end = System.nanoTime();
        
        System.out.printf("Total time (ms): %d\n", (end-start)/(1000*1000));
    }
    
    public static MyFactors findFactors(long num) {
        System.out.printf("%s: factors for num = %d\n", Thread.currentThread().getName(), num);
        
        MyFactors factors = new MyFactors();
        
        if (num > 0) {
            factors.setNum(num);
            factors.add(1);
            for (long n = 2; n < num/2; n++) {
                if (num % n == 0) {
                    factors.add(n);
                }
            }
            factors.add(num);
        }
        
        return factors;
    }
}

Executing the program from Listing.8 will generate the following output:

Output.6

main: factors for num = 976534
Num: 976534, Factors: [1, 2, 13, 23, 26, 46, 71, 142, 299, 529, 598, 923, 1058, 1633, 1846, 3266, 6877, 13754, 21229, 37559, 42458, 75118, 976534]
main: factors for num = 1267543
Num: 1267543, Factors: [1, 47, 149, 181, 7003, 8507, 26969, 1267543]
main: factors for num = 1543087
Num: 1543087, Factors: [1, 7, 13, 31, 91, 217, 403, 547, 2821, 3829, 7111, 16957, 49777, 118699, 220441, 1543087]
main: factors for num = 845987
Num: 845987, Factors: [1, 845987]
main: factors for num = 1720728
Num: 1720728, Factors: [1, 2, 3, 4, 6, 8, 9, 12, 18, 24, 36, 72, 23899, 47798, 71697, 95596, 143394, 191192, 215091, 286788, 430182, 573576, 1720728]
main: factors for num = 590630
Num: 590630, Factors: [1, 2, 5, 10, 59063, 118126, 590630]
main: factors for num = 1935609
Num: 1935609, Factors: [1, 3, 13, 31, 39, 93, 403, 1209, 1601, 4803, 20813, 49631, 62439, 148893, 645203, 1935609]
main: factors for num = 1390765
Num: 1390765, Factors: [1, 5, 349, 797, 1745, 3985, 278153, 1390765]
main: factors for num = 320609
Num: 320609, Factors: [1, 320609]
main: factors for num = 1123001
Num: 1123001, Factors: [1, 11, 121, 9281, 102091, 1123001]
Total time (ms): 147

The following is the same simple program that finds the factors for all the numbers in a list, expect it executes in a parallel mode with a small code change:

Listing.9
/*
 * 
 * Name:   StreamsTest7
 * 
 * Author: Bhaskar S
 * 
 * Date:   10/25/2014
 * 
 */

package com.polarsparc.java8;

import java.util.Arrays;
import java.util.stream.LongStream;

public class StreamsTest7 {
    public static void main(String[] args) {
        LongStream numbers = Arrays.stream(new long[] {
            976534, 1267543, 1543087, 845987, 1720728,
            590630, 1935609, 1390765, 320609, 1123001
        });
        
        long start = System.nanoTime();
        
        numbers
            .parallel()
            .mapToObj(StreamsTest7::findFactors)
            .forEach(fac -> System.out.println(fac));
        
        long end = System.nanoTime();
        
        System.out.printf("Total time (ms): %d\n", (end-start)/(1000*1000));
    }
    
    public static MyFactors findFactors(long num) {
        System.out.printf("%s: factors for num = %d\n", Thread.currentThread().getName(), num);
        
        MyFactors factors = new MyFactors();
        
        if (num > 0) {
            factors.setNum(num);
            factors.add(1);
            for (long n = 2; n < num/2; n++) {
                if (num % n == 0) {
                    factors.add(n);
                }
            }
            factors.add(num);
        }
        
        return factors;
    }
}

The only change is the addition of the operation parallel() to the stream.

Executing the program from Listing.9 will generate the following output:

Output.7

main: factors for num = 1935609
ForkJoinPool.commonPool-worker-1: factors for num = 1543087
ForkJoinPool.commonPool-worker-2: factors for num = 320609
ForkJoinPool.commonPool-worker-3: factors for num = 1390765
Num: 320609, Factors: [1, 320609]
ForkJoinPool.commonPool-worker-2: factors for num = 1123001
Num: 1123001, Factors: [1, 11, 121, 9281, 102091, 1123001]
ForkJoinPool.commonPool-worker-2: factors for num = 1267543
Num: 1390765, Factors: [1, 5, 349, 797, 1745, 3985, 278153, 1390765]
ForkJoinPool.commonPool-worker-3: factors for num = 1720728
Num: 1935609, Factors: [1, 3, 13, 31, 39, 93, 403, 1209, 1601, 4803, 20813, 49631, 62439, 148893, 645203, 1935609]
main: factors for num = 590630
Num: 1267543, Factors: [1, 47, 149, 181, 7003, 8507, 26969, 1267543]
ForkJoinPool.commonPool-worker-2: factors for num = 976534
Num: 1543087, Factors: [1, 7, 13, 31, 91, 217, 403, 547, 2821, 3829, 7111, 16957, 49777, 118699, 220441, 1543087]
ForkJoinPool.commonPool-worker-1: factors for num = 845987
Num: 590630, Factors: [1, 2, 5, 10, 59063, 118126, 590630]
Num: 976534, Factors: [1, 2, 13, 23, 26, 46, 71, 142, 299, 529, 598, 923, 1058, 1633, 1846, 3266, 6877, 13754, 21229, 37559, 42458, 75118, 976534]
Num: 845987, Factors: [1, 845987]
Num: 1720728, Factors: [1, 2, 3, 4, 6, 8, 9, 12, 18, 24, 36, 72, 23899, 47798, 71697, 95596, 143394, 191192, 215091, 286788, 430182, 573576, 1720728]
Total time (ms): 112

Notice from the Output.7 that Streams automatically handles parallelism under-the-hood using Fork-Join.

We did not have to write a single line of multi-threaded code - ain't this powerful !!!