使用并行流stream的正确姿势

Posted by LIUSHUO on September 24, 2019

layout: post title: “Parallel Stream的使用实践你真的掌握了?” subtitle: “When and How to Use Parallel Stream” date: 2019-09-24 author: LiuShuo header-img: img/home-bg-o.jpg catalog: true tags: - Java - Streaming —

并行流是一把利剑,玩不好会伤害到自己,本文对parallel stream的使用场景进行分析。

注:本文大部分观点均来自StackOverFlow的这个帖子Brian Goetz大神的Understanding Parallel Stream Performance in Java SE 8 ,非常值得一读。

并行流不是万金油

it is a bad idea to just drop .parallel() all over the place simply because you can.

并不是说你将流转化为并行流就可以在多核环境下加快你的计算,因为计算的过程及逻辑是由你的代码来实现的,如果实现写的不好,可能会因为使用并行流而降低效率,甚至还不如串行的速度快。

并行流唯一确定的是,它比串行流执行的任务要多:

A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks.

串行的有优点是它的结果一定是确定的,而并行的劣势就是不确定性,但有些时候,可以通过对并行执行的方法进行一些限制来规避这种不确定性,如规约方法在并行流中往往需要具备:1 )无状态;2)可组合。

Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn’t matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)

如果并行中涉及共享资源,则必须保证线程安全,否则这些side effect会扼杀你对并行提速的幻想。

Moreover, remember that parallel streams don’t magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you’ll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.

怎么做

Brian Goetz给出了一个很不错的角度看这个问题:

It is best to develop first using sequential execution and then apply parallelism where

(A) you know that there’s actually benefit to increased performance and

(B) that it will actually deliver increased performance.

(A) is a business problem, not a technical one. If you are a performance expert, you’ll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don’t even bother until you’re convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)

JB Nizet也给出了一个比较好的实践方案:

I would use sequential streams by default and only consider parallel ones if

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)

  • I have a performance problem in the first place

  • I don’t already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)

References

  • https://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible
  • https://www.infoq.com/presentations/parallel-java-se-8
  • https://stackoverrun.com/cn/q/10341100

本文首次发布于 LiuShuo’s Blog, 转载请保留原文链接.