5天玩转C#并行和多线程编程 —— 第二天并行集合和PLinq

浏览数：75 / 时间：2015年06月08日

　　在上一篇博客5天玩转C#并行和多线程编程 —— 第一天认识Parallel中，我们学习了Parallel的用法。并行编程，本质上是多线程的编程，那么当多个线程同时处理一个任务的时候，必然会出现资源访问问题，及所谓的线程安全。就像现实中，我们开发项目，就是一个并行的例子，把不同的模块分给不同的人，同时进行，才能在短的时间内做出大的项目。如果大家都只管自己写自己的代码，写完后发现合并不到一起，那么这种并行就没有了意义。

　　并行算法的出现，随之而产生的也就有了并行集合，及线程安全集合；微软向的也算周到，没有忘记linq，也推出了linq的并行版本，plinq - Parallel Linq.

一、并行集合 —— 线程安全集合

　　并行计算使用的多个线程同时进行计算，所以要控制每个线程对资源的访问，我们先来看一下平时常用的List<T>集合，在并行计算下的表现，新建一个控制台应用程序，添加一个PEnumerable类(当然你也直接写到main方法里面测试，建议分开写)，写如下方法：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections.Concurrent;

namespace ThreadPool
{
   public class PEnumerable
   {
      public static void ListWithParallel()
      {
         List<int> list = new List<int>();
         Parallel.For(0, 10000, item =>
         {
            list.Add(item);
         });
         Console.WriteLine("List‘s count is {0}",list.Count());
      }
   }
}

点击F5运行，得到如下结果：

看到结果中显示的5851，但是我们循环的是10000次啊！怎么结果不对呢？这是因为List<T>是非线程安全集合，意思就是说所有的线程都可以修改他的值。

下面我们来看下并行集合 —— 线程安全集合，在System.Collections.Concurrent命名空间中，首先来看一下ConcurrentBag<T>泛型集合，其用法和List<T>类似，先来写个方法测试一下：

public static void ConcurrentBagWithPallel()
      {
         ConcurrentBag<int> list = new ConcurrentBag<int>();
         Parallel.For(0, 10000, item =>
         {
            list.Add(item);
         });
         Console.WriteLine("ConcurrentBag‘s count is {0}", list.Count());
      }

同时执行两个方法，结果如下：

可以看到，ConcurrentBag集合的结果是正确的。下面我们修改代码看看ConcurrentBag里面的数据到底是怎么存放的，修改代码如下：

public static void ConcurrentBagWithPallel()
      {
         ConcurrentBag<int> list = new ConcurrentBag<int>();
         Parallel.For(0, 10000, item =>
         {
            list.Add(item);
         });
         Console.WriteLine("ConcurrentBag‘s count is {0}", list.Count());
         int n = 0;
         foreach(int i in list)
         {
            if (n > 10)
               break;
            n++;
            Console.WriteLine("Item[{0}] = {1}",n,i);
         }
         Console.WriteLine("ConcurrentBag‘s max item is {0}", list.Max());

      }

先来看一下运行结果：

可以看到，ConcurrentBag中的数据并不是按照顺序排列的，顺序是乱的，随机的。我们平时使用的Max、First、Last等linq方法都还有。其时分类似Enumerable的用法，大家可以参考微软的MSDN了解它的具体用法。

关于线程安全的集合还有很多，和我们平时用的集合都差不多，比如类似Dictionary的ConcurrentDictionary，还有ConcurrentStack，ConcurrentQueue等。

二、Parallel Linq的用法及性能

1、AsParallel

前面了解了并行的For和foreach,今天就来看一下Linq的并行版本是怎么样吧？为了测试，我们添加一个Custom类，代码如下：

public class Custom
   {
      public string Name { get; set; }
      public int Age { get; set; }
      public string Address { get; set; }
   }

写如下测试代码：

 public static void TestPLinq()
      {
         Stopwatch sw = new Stopwatch();
         List<Custom> customs = new List<Custom>();
         for (int i = 0; i < 2000000; i++)
         {
            customs.Add(new Custom() { Name = "Jack", Age = 21, Address = "NewYork" });
            customs.Add(new Custom() { Name = "Jime", Age = 26, Address = "China" });
            customs.Add(new Custom() { Name = "Tina", Age = 29, Address = "ShangHai" });
            customs.Add(new Custom() { Name = "Luo", Age = 30, Address = "Beijing" });
            customs.Add(new Custom() { Name = "Wang", Age = 60, Address = "Guangdong" });
            customs.Add(new Custom() { Name = "Feng", Age = 25, Address = "YunNan" });
         }

         sw.Start();
         var result = customs.Where<Custom>(c => c.Age > 26).ToList();
         sw.Stop();
         Console.WriteLine("Linq time is {0}.",sw.ElapsedMilliseconds);

         sw.Restart();
         sw.Start();
         var result2 = customs.AsParallel().Where<Custom>(c => c.Age > 26).ToList();
         sw.Stop();
         Console.WriteLine("Parallel Linq time is {0}.", sw.ElapsedMilliseconds);
      }

其实也就是加了一个AsParallel()方法，下面来看下运行结果：

时间相差了一倍，不过有时候不会相差这么多，要看系统当前的资源利用率。大家可以多测试一下。

其实，AsParallel()这个方法可以应用与任何集合，包括List<T>集合，从而提高查询速度和系统性能。

2、GroupBy方法

在项目中，我们经常要对数据做处理，比如分组统计，我们知道在linq中也可以实现，今天来学习一下新的ToLookup方法，写一个测试方法，代码如下：

public static void OrderByTest()
      {
         Stopwatch stopWatch = new Stopwatch();
         List<Custom> customs = new List<Custom>();
         for (int i = 0; i < 2000000; i++)
         {
            customs.Add(new Custom() { Name = "Jack", Age = 21, Address = "NewYork" });
            customs.Add(new Custom() { Name = "Jime", Age = 26, Address = "China" });
            customs.Add(new Custom() { Name = "Tina", Age = 29, Address = "ShangHai" });
            customs.Add(new Custom() { Name = "Luo", Age = 30, Address = "Beijing" });
            customs.Add(new Custom() { Name = "Wang", Age = 60, Address = "Guangdong" });
            customs.Add(new Custom() { Name = "Feng", Age = 25, Address = "YunNan" });
         }

         stopWatch.Restart();
         var groupByAge = customs.GroupBy(item => item.Age).ToList();
         foreach (var item in groupByAge)
         {
            Console.WriteLine("Age={0},count = {1}", item.Key, item.Count());
         }
         stopWatch.Stop();

         Console.WriteLine("Linq group by time is: " + stopWatch.ElapsedMilliseconds);


         stopWatch.Restart();
         var lookupList = customs.ToLookup(i => i.Age);
         foreach (var item in lookupList)
         {
            Console.WriteLine("LookUP:Age={0},count = {1}", item.Key, item.Count());
         }
         stopWatch.Stop();
         Console.WriteLine("LookUp group by time is: " + stopWatch.ElapsedMilliseconds);
      }