Research on data pre-deployment in information service flow of digital ocean cloud computing
-
摘要: 考虑到HDFS的数据预部署比传统意义上的数据部署更加复杂,需要解决预取什么数据,预取数据的目标位置,预取数据量,以及预取数据服务与正常数据访问冲突的平衡等多种关键问题,本文针对数字海洋信息服务流特点,提出了输入数据预取与输出数据定向存储相结合的部署方案,实现数据准备与数据处理的并行,从而减少数字海洋云计算平台在处理多源文件协同工作时大量I/O时间开销。从实验结果看,本文方法比传统hadoop机制具有更高的并行度,服务结点运行等待时间缩短,数据冲突缓解。Abstract: Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.
-
Key words:
- HDFS /
- data prefetching /
- cloud computing /
- service flow /
- digital ocean
-
Chen D W, He Y J. 2010. A study on secure data storage strategy in cloud computing. JCIT: Journal of Convergence Information Technology, 5(7): 175-179 Chilimbi T M, Hirzel M. 2002. Dynamic hot data stream prefetching for general-purpose programs. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation. New York: ACM Press, 199-209 Cilku B, Ye X D, Hu G, et al. 2010. Using a local prefetch strategy to obtain temporal time predictability. In: 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2010). Chengdu, China: IEEE, 8: 576-580 Couceiro M, Romano P, Rodrigues L. 2011. PolyCert: Polymorphic self-optimizing replication for in-memory transactional grids. In: Proceedings of the ACM/IFIP/USENIX 12th International Middleware Conference. Berlin Heidelberg: Springer, 309-328 Huang Y, Gu Z M, Tang J, et al. 2012. Reducing cache pollution of threaded prefetching by controlling prefetch distance. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012). Shanghai, China: IEEE, 1812-1819 Kawata S. 2010. Review of PSE (Problem Solving Environment) study. JCIT: Journal of Convergence Information Technology, 5(4): 204-215 Kobashi H, Kawata S, Manabe Y, et al. 2010. PSE Park: Framework for problem solving environments. JCIT: Journal of Convergence Information Technology, 5(4): 225-239 Kyriazis D, Tserpes K, Menychtas A, et al. 2008. An innovative workflow mapping mechanism for grids in the frame of quality of service. Future Generation Computer Systems, 24(6): 498-511 Lin F, Zeng W H, Jiang Y, et al. 2010. A group tracing and filtering tree for REST DDos in cloud. JDCTA: International Journal of Digital Content Technology and its Applications, 4(9): 212-224 Lin L, Li X M, Jiang H, et al. 2008. AMP: an affinity-based metadata prefetching scheme in large-scale distributed storage systems. In: Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'08). 459-466 Liu K, Chen J, Yang Y, et al. 2008. A throughput maximization strategy for scheduling transaction-intensive workflows on SwinDeW-G. In: Concurrency and Computation: Practice and Experience-2nd International Workshop on Workflow Management and Applications in Grid Environments. Chichester, UK: John Wiley and Sons Ltd., 1807-1820 Nori A K. 2010. Distributed caching platforms. In: Proceedings of the 36th International Conference on Very Large Data Bases (VLDB 2010). Singapore: VLDB Endowment Inc., 1645-1646 Seo S, Jang I, Woo K, et al. 2009. HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment. In: Proceedings of IEEE International Conference on Cluster Computing and Workshops. New Orleans, LA: IEEE, 1-8 Shafer J, Rixner S, Cox A L. 2010. The Hadoop distributed filesystem: Balancing portability and performance. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2010). White Plains, NY: IEEE, 122-133 Shi Suixiang, Liu Yang, Wei Hongyu, et al. 2013. Research on cloud computing and services framework of marine environmental information management. Acta Oceanologica Sinica, 32(10):57-66 Tang L M, Xing S X, Chen T H. 2012. An improved adaptive cache prefetch algorithm. In: 2012 5th International Symposium on Computational Intelligence and Design (ISCID 2012), 2: 255-258 Wenisch T F, Somogyi S, Hardavellas N, et al. 2005. Temporal streaming of shared memory. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture. Los Alamitos: IEEE Computer Society, 222-233 Wu C J, Jaleel A, Martonosi M, et al. 2011. PACMan: Prefetchaware cache management for high performance caching. In: Proceedings of the Annual International Symposium on Microarchitecture, MICRO. Porto Alegre, Brazil: ACM, 442-453 Xu Y J, Xu L Y, Liu N, et al. 2010. Marine service flow design based on cloud computing. In: 2010 3rd International Conference on Computer and Electrical Engineering. V4-24-V4-27 Yoon U K, Kim H J, Chang J Y. 2010. Intelligent data prefetching for hybrid flash-disk storage using sequential pattern mining technique. In: Proceedings of the 2010 IEEE/ACIS 9th International Conference on Computer and Information Science. Yamagata: IEEE, 280-285
点击查看大图
计量
- 文章访问数: 1639
- HTML全文浏览量: 71
- PDF下载量: 1830
- 被引次数: 0