Workshop Overview

The Fourth Big Data and Computational Intelligence Workshop (BDCI2017) will be held in Beijing, from July 29 to 30, 2017. This two-day workshop invites world-class speakers to introduce and share the recent achievements in data science, computational intelligence and their applications. The workshop is organized by Beijing Advanced Innovation Center for Big Data and Brain Computing(BDBC) with the sponsorship of BeihangUniversity, Beijing Municipal Commission of Education, and National Basic Research Program (973 Program).


July 29, 2017

Venue: Kunlun Hall, 2nd Floor, Vision Hotel
时间 Time题目 TopicsPDF报告人 Speaker主持人 Chair
8:00 ~ 8:30会议签到 Registration
8:30 ~ 9:00致辞 Opening Remarks
9:00 ~ 9:45 Is Big Data Analytics beyond the Reach of Small Companies? Wenfei Fan茆永轶
Yongyi Mao
9:45 ~ 10:30 Answering queries on databases with nulls: combining correctness and efficiency Leonid Libkin
10:30 ~ 10:45 茶歇 Tea Break
10:45 ~ 11:30 A uniform approach to data exchange and data quality Floris Geerts
11:30 ~ 12:00 Big Data Analysis for Human Brain Functional Network Study Tingting Zhang
12:00 ~ 13:30午餐 Lunch
13:30 ~ 14:10 三元空间大数据计算理论:大数据智能思考 朱文武胡春明
Chunming Hu
14:10 ~ 14:50 异构多源大数据的协同计算方法 过敏意
14:50 ~ 15:10 茶歇 Tea Break
15:10 ~ 15:50 网络数据科学与大数据分析系统 程学旗
15:50 ~ 16:30 Knowledge Graph Research and Application at Baidu Yong Zhu
16:30 ~ 17:00 From Event Detection to Data Analysis System: Decoding Intelligence 李建欣
17:00 ~ 18:30晚餐 Dinner

July 30, 2017

Venue: Kunlun Hall, 2nd Floor, Vision Hotel
时间 Time题目 TopicsPDF报告人
8:30 ~ 9:15 Analyzing and Speeding Up Private Blockchains OOI Beng Chin 马帅
Shuai Ma
9:15 ~ 10:00 Human-Powered Machine Learning Lei Chen
10:00 ~ 10:30 群智资源管理: 问题与研究进展 孙海龙
10:30 ~ 10:45 茶歇 Tea Break
10:45 ~ 11:30 大数据时代的新型计算机系统结构 金海
11:30 ~ 12:15 基于图计算的高性能大数据分析系统Gemini 陈文光
12:15 ~ 14:00午餐 Lunch
14:00 ~ 14:30 对工控系统安全防护技术的探讨 孙利民李建欣
Jianxin Li
14:30 ~ 15:00 物联网安全新威胁与防御新对策 阚志刚
15:00 ~ 15:30 Skyfire: Data-Driven Seed Generation for Fuzzing Wei Lei
15:30 ~ 15:45 茶歇 Tea Break
15:45 ~ 16:15 工业企业网络安全风险分析及对策建议 刘法旺
16:15 ~ 16:45 工业信息安全形势严峻、安全保障工作刻不容缓 张格
16:45 ~ 17:15 工业信息安全技术探讨 李博
17:15 ~ 18:30晚餐 Dinner

特邀报告 Invited Talks

Title: Is Big Data Analytics beyond the Reach of Small Companies?

by Wenfei Fan (BDBC, Beihang University& University of Edinburgh)

Abstract :

Big data analytics is often prohibitively costly. It is typically conducted by parallel processing with a cluster of machines, and is considered a privilege of big companies that can afford the resources. This talk argues that big data analytics is accessible to small companies with constrained resources. As an evidence, we present BEAS, a system for querying big data with bounded resources. BEAS advocates a resource-constrained query evaluation paradigm, based on a theory of bounded evaluation and a data-driven approximation scheme.

Biography :

Professor Wenfei Fan is the Chair of Web Data Management in the School of Informatics, University of Edinburgh, UK, the chief scientist of Beijing Advanced Innovation Center for Big Data and Brain Computing,and the director of the International Research Center on Big Data, Beihang University, Beijing, China. Prior to his move to the UK, he worked for Bell Labs, Lucent Technologies in the US. He received his PhD from the University of Pennsylvania, USA, and his MS and BS from Peking University, China.

Professor Fan is a Fellow of the Royal Society of Edinburgh, UK, a Fellow of the ACM, USA, a National Professor of the 1000-Talent Program and a Yangtze River Scholar, China. He is a recipient of the Alberto O. Mendelzon Test-of-Time Award of ACM PODS (2010,2015), the Best Paper Award for ACM SIGMOD 2017,VLDB 2010, the Roger Needham Award in 2008 (UK), the Best Paper Award for ICDE 2007, the Outstanding Overseas Young Scholar Award in 2003, the Best Paper of the Year Award for Computer Networks in 2002, and the Career Award in 2001 (USA). His current research interests include database theory and systems, in particular big data, data quality, data integration, distributed query processing, query languages, recommender systems, social networks and Web services.

Title: Answering queries on databases with nulls: combining correctness and efficiency

by Leonid Libkin ( University of Edinburgh)

Abstract :

Computing certain answers is the standard way of answering queries over incomplete data; it is also used in many applications such as data integration, data exchange, consistent query answering, ontology-based data access, etc. Unfortunately certain answers are often computationally expensive, and in most applications their complexity is intolerable if one goes beyond the class of conjunctive queries (CQs), or a slight extension thereof.

However, high computational complexity does not yet mean one cannot approximate certain answers efficiently. In this talk we survey several recent results on finding such efficient and correct approximations, going significantly beyond CQs. We do so in a setting of databases with missing values, and first-order (relational calculus/algebra) queries. Even the class of queries where the standard database evaluation produces correct answers is larger than previously thought. When it comes to approximations, we present two schemes with good theoretical complexity. One of them also performs very well in practice, and restores correctness of SQL query evaluation on databases with nulls.

Biography :

Leonid Libkin is Professor of Foundations of Data Management in the School of Informatics at the University of Edinburgh. He was previously a Professor at the University of Toronto and a member of research staff at Bell Laboratories in Murray Hill. He received his PhD from the University of Pennsylvania in 1994. His main research interests are in the areas of data management and applications of logic in computer science. He has written five books and about 200 technical papers (including 12 JACM). His awards include a Marie Curie Chair Award, a Royal Society Wolfson Research Merit Award, and five Best Paper Awards. He has chaired programme committees of major database conferences (ACM PODS, ICDT) and was the conference chair of the 2010 Federated Logic Conference. He has given many invited conference talks and has served on multiple program committees and editorial boards. He is an ACM fellow, a fellow of the Royal Society of Edinburgh, and a member of Academia Europaea.

Title: A uniform approach to data exchange and data quality

by Floris Geerts ( University of Antwerp)

Abstract :

In data management, two crucial activities are data exchange and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. So far, these two problems have been mainly considered in isolation. In this talk, a uniform framework will be described that combines data exchange and data repairing. The underlying technique (the so-called “chase") originates from data exchange but is expanded with qualitative information to ensure that during data exchange, the quality of the migrated data improves. Furthermore, by tuning the qualitative information, many known data quality repair method are encompassed by this approach. In addition, it is a conservative extension of the standard data exchange setting which typically ignores data quality aspects. From a practical point of view, a scalable implementation of the revised chase is developed resulting in a flexible and efficient open source data quality system.

Biography :

Floris Geerts holds a research professor position at the University of Antwerp, Belgium. Before that, he held a senior research fellow position in the database group at the University of Edinburgh, and a postdoc position in the data mining group at the University of Helsinki. He received his PhD in 2001 at the University of Hasselt, Belgium. His research interests include the theory and practice of databases and the study of data quality in particular. He received several best paper awards (ICDM 2001, ICDE 2007, ADBIS 2015) and was recipient of the 2015 Alberto O. Mendelzon Test of Time Award (PODS 2015). He is an associate editor of ACM TODS, was general chair of EDBT/ICDT 2015 and will be PODS PC chair in 2017.

Title: Big Data Analysis for Human Brain Functional Network Study

by Tingting Zhang (University of Virginia)

Abstract :

The human brain is a network system consisting of a large number of spatially distributed, connected regions. Deep understanding of how these regions interact with each other to perform different brain functions will help us cure various mental and neurological diseases and create large-scale intelligent information processing systems. The recent technology development enables scientists to measure human brain activity with an unprecedented scope. These large data sets create opportunities as well as challenges to scientists studying the human brain. This talk reviews several research projects on the brain network studies based on big human brain data and presents translational research using new brain technologies that potentially has an enormous impact on our daily life.

Biography :

Tingting Zhang is an Associate Professor of Statistics at the University of Virginia. Zhang received her mathematics from Peking University and her Statistics from Harvard University. She will serve as the Program Chair-Elect of the imaging section in the American Statistical Association. Her current research interests include Bayesian statistics, human brain mapping, and computational neuroscience.

Title: 三元空间大数据计算理论:大数据智能思考

by 朱文武(清华大学)

Abstract :

本报告将首先介绍三元空间大数据计算理论与方法973项目近两年多的进展。该项目以信息空间、物理世界和人类社会三元空间所构成的大数据为研究对象,以三元空间大数据的融合分析与知识生成为研究目标,研究三元空间大数据的关联深度表征学习以及群体智能计算的理论方法。 进一步,本报告将介绍大数据智能的内涵和外延,并探讨如何通过大数据分析产生大数据智能。

Biography :

朱文武,清华大学计算机系副主任,大数据算法与分析国家工程实验室副主任,国家973项目首席科学家,国家自然科学基金委视频大数据重大项目首席科学家,国家“千人计划”特聘专家。AAAS Fellow、 IEEE Fellow、SPIE Fellow、ACM 杰出科学家。现担任IEEE Transactions on Multimedia 主编。曾任微软亚洲研究院主任研究员,英特尔中国研究院首席科学家,及美国贝尔实验室研究员等职。曾6次获ACM及IEEE等国际最佳论文奖。获2012年度国家自然科学二等奖(排名第2)。

Title: 异构多源大数据的协同计算方法

by 过敏意(上海交通大学)

Abstract :

城市大数据是在城市管理、生活、建设、发展等过程中,由信息空间、物理世界和人类社会三元空间所产生的多源、多模态、异构海量数据。深入挖掘这些数据中蕴涵着丰富的知识和巨大的价值,能够为智慧城市建设与管理提供了最客观的依据。但是,城市大数据特有 “内在关联但外在隔离”、“海量丰富但低质碎片”等特点,对大数据分析提出了更加严峻的挑战,归根结底是要解决异构多源大数据的协同计算问题。 本讲座将介绍如何关联分布在三元空间中的数据片断,探索群智感知的机理与方法;如何融合人类群体智慧与机器强大的计算能力,探索群智认知的理论与模型;创建一套以数据三元化、认知群智化、计算层次化为特征的异构多源大数据计算理论、方法。

Biography :

过敏意,博士,博士生导师;国家杰出青年科学基金获得者;2010年入选国家千人计划。现任上海交通大学计算机科学与工程系主任,上海交通大学“致远“讲席教授。是973计划首席科学家,教育部创新团队带头人,上海市优秀学术带头人。在嵌入式与普适计算、并行与分布计算、编译与程序优化等领域进行了深入系统的研究,近年来从事大数据、智慧城市方面的研究,在各种学术期刊、会议上发表了超过300篇论文,授权专利30多项,在Springer等出版专著4部。现担任IEEE Trans. On Parallel and Distributed Systems, Journal of Parallel and Distributed Computing等期刊的编委。CCF常务理事和会士。

Title: 网络数据科学与大数据分析系统

by 程学旗(中科院计算所)

Abstract :


Biography :


Title: Knowledge Graph Research and Application at Baidu

by Yong Zhu (Baidu)

Abstract :

Baidu Knowledge Graph is the largest knowledge graph in Chinese language, containing high quality, comprehensive and fresh data across hundreds of millions of entities and tens of billions of facts and connections. Using the state-of-the-art technologies, Baidu Knowledge Graph is built based on knowledge extracted from the whole Internet. We are using knowledge graph to explore innovative ways to understand the real world and to power a wide range of applications such as question answering, semantic search, dialogue systems, and automatic content creation. This talk presents an overview of Baidu's Knowledge Graph and its applications.

Biography :

Yong Zhu is a Chief Architect at Baidu. He has been with Baidu since 2014 and led Baidu's R&D on Web Crawling, Web Data Mining, Knowledge Graph and Search Features (Baidu Aladdin Project). Yong served as Technical Committee Co-Chair of Baidu Search Company and Baidu AI Group.

Yong received his B.S. degree from the University of Science and Technology of China in 1997, M.S. degree from the Institute of Automation, Chinese Academy of Sciences in 2000, and Ph.D. in Computer Science from Georgia Institute of Technology in 2006. Prior to joining Baidu, Yong worked in areas of Search, Social, Retail and Infrastructure at Google and Amazon in US.

Yong is specialized in search engine technologies & products, knowledge graph, large-scale distributed systems. His past research area also includes pattern recognition and computer networks.

Title: From Event Detection to Data Analysis System: Decoding Intelligence

by Jianxin Li(BDBC, Beihang University)

Abstract :

After decades of plodding developments in burst event detection of stream data, a host of new challenges has cropped up, like large scale of users, networked connections, and growing heterogeneous data. This talk tries to re-think the design of intelligent methods and foundation of smart computing systems. As an example, we build Ring, a high-performance distributed burst event detection system based on stream graph models, which contains key modules: Anomaly Graph Detection, Entity Portrait Summary and Knowledge Graph. Beyond this system, we develop an incremental vector embedding technique and graph modeling methods to represent the stream data, and improve learning process by diversity regulation and stacked kernel deep learning strategies. To support a real smart computing system, an efficient feature engineering and fault-tolerant distributed machine learning system is investigated.

Biography :

Jianxin Li is a professor at the School of Computer Science and Engineering, Beihang University. He received his Ph.D. degree from Beihang University in 2008. He was a visiting scholar in machine learning department of CMU in 2015, and a visiting researcher of MSRA in 2011. His current research interests include data analysis and processing, network security, and virtualization.


Title: Analyzing and Speeding Up Private Blockchains

by OOI Beng Chin ( National University of Singapore)

Abstract :

Blockchain technologies are taking the world by storm. Public blockchains, such as Bitcoin and Ethereum, enable secure peer-to-peer applications like crypto-currency or smart contracts. Private blockchain systems, on the other hand, target and aim to disrupt applications which have so far been implemented on top of database systems, for example banking, finance and trading applications. Multiple platforms for private blockchains are being actively developed and fine tuned. However, there is a clear lack of a systematic framework with which different systems can be analyzed and compared against each other. Such a framework can be used to assess blockchains' viability as another distributed data processing platform, while helping developers to identify bottlenecks and accordingly improve their platforms.

In this talk, I first describe BLOCKBENCH, the first evaluation framework for analyzing private blockchains. It serves as a fair means of comparison for different platforms and enables deeper understanding of different system design choices. Any private blockchain can be integrated to Blockbench via simple APIs and benchmarked against workloads that are based on real and synthetic smart contracts. Blockbench measures overall and component-wise performance in terms of throughput, latency, scalability and fault-tolerance. Next, we use Blockbench to conduct comprehensive evaluation of three major private blockchains: Ethereum, Parity and Hyperledger Fabric. The results demonstrate that these systems are still far from displacing current database systems in traditional data processing workloads. Furthermore, there are gaps in performance among the three systems which are attributed to the design choices at different layers of the blockchain's software stack. I will also discuss various strategies that may be useful to speed up Blockchain systems.

Biography :

Beng Chin is a Distinguished Professor of Computer Science, NGS faculty member and Director of IDMI at the National University of Singapore (NUS), and an adjunct Chang Jiang Professor at Zhejiang University. He obtained his BSc (1st Class Honors) and PhD from Monash University, Australia, in 1985 and 1989 respectively. His research interests include database, distributed processing, and large scale analytics, in the aspects of system architectures, performance issues, security, accuracy and correctness. Beng Chin has served as a PC member for international conferences such as ACM SIGMOD, VLDB, IEEE ICDE, WWW, and SIGKDD, and as Vice PC Chair for ICDE'00,04,06, PC co-Chair for SSD'93 and DASFAA'05, PC Chair for ACM SIGMOD'07, Core DB PC chair for VLDB'08, and PC co-Chair for IEEE ICDE'12 and IEEE Big Data'15. He is serving as a PC Chair for IEEE ICDE'18. He was an editor of VLDB Journal and IEEE Transactions on Knowledge and Data Engineering, Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (TKDE)(2009-2012), Elsevier's co-Editor-in-Chief of Journal of Big Data Research (2013-2015), and a co-chair of the ACM SIGMOD Jim Gray Best Thesis Award committee. He is serving as an editor of IEEE Transactions on Cloud Computing and Springer's Distributed and Parallel Databases. He is also serving as a Trustee Board Member and President of VLDB Endowment, and an Advisory Board Member of ACM SIGMOD. He co-founded yzBigData (hhtp:// in 2012 for Big Data Management and analytics, and Shentilium ( in 2016 for AI- and data-driven Finance Data Analytics.

Beng Chin was the recipient of ACM SIGMOD 2009 Contributions award, a co-winner of the 2011 Singapore President's Science Award, the recipient of 2012 IEEE Computer Society Kanai award, 2013 NUS Outstanding Researcher Award, 2014 IEEE TCDE CSEE Impact Award, and 2016 China Computer Federation (CCF) Overseas Outstanding Contributions Award.. He is a fellow of the ACM , IEEE, and Singapore National Academy of Science (SNAS).

Title: Human-Powered Machine Learning

by Lei Chen ( Hong Kong University of Science and Technology)

Abstract :

Recently, machine learning becomes quite popular and attractive, not only to academia but also to the industry. The successful stories of machine learning on Alpha-go and Texas hold 'em games raise significant interests on machine learning. The question is whether machine learning can do everything perfect? In this talk, I will first give several examples that current machine learning techniques have difficulty to perform well. Then, I will show by putting human in the machine-learning loop, the results can be significantly improved. After that, I will discuss the challenges and opportunities for this human-powered machine learning paradigm.

Biography :

Lei Chen received the BS degree in computer science and engineering from Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Canada, in 2005. He is currently a full professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include human-powered machine learning, crowdsourcing , social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. The system developed by his team won the excellent demonstration award in VLDB 2014. He got the SIGMOD Test-of-Time Award in 2015. He is PC Track chairs for SIGMOD 2014, VLDB 2014, ICDE 2012, CIKM 2012, SIGMM 2011. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. Currently, he serves as Editor-in-Chief of VLDB Journal and an associate editor-in-chief of IEEE Transaction on Data and Knowledge Engineering. He is a member of the VLDB endowment.

Title: 群智资源管理: 问题与研究进展

by 孙海龙 (北京航空航天大学大数据科学与脑机智能高精尖中心)

Abstract :


Biography :

孙海龙,博士,北京航空航天大学计算机学院长聘副教授,博士生导师。主要研究方向包括群体计算、软件开发方法和分布式系统等。曾获全国优博奖和CCF优博奖、国家技术发明二等奖2项、教育部科技进步一等奖3项,入选教育部新世纪优秀人才和北京市科技新星计划。指导学生获得国际开源软件联盟OW2编程竞赛一等奖、IEEE SCC’13最佳学生论文奖。

Title: 大数据时代的新型计算机系统结构

by 金海(华中科技大学)

Abstract :


Biography :


Title: 基于图计算的高性能大数据分析系统Gemini

by 陈文光(清华大学)

Abstract :

现有大数据分析系统,如MapReduce和Spark,主要以编程的简易性、可扩展性和容错能力为设计原则,牺牲了平台的处理性能。我们将讨论容错与性能的关系,指出性能与容错并非是相互排斥的设计理念,并介绍高性能分布式图计算系统的实例Gemini。Gemini在图的划分方法、数据结构、局部性优化、细粒度负载平衡和通信与计算重叠方面提出了一系列优化技术。在典型的图处理应用中,该系统需要的内存是约为GraphX的十分之一,性能是Spark GraphX的100倍以上。

Biography :


Title: 对工控系统安全防护技术的探讨

by 孙利民(中科院信息工程研究)

Abstract :


Biography :



by 阚志刚(梆梆安全)

Abstract :


Biography :


Title: Skyfire: Data-Driven Seed Generation for Fuzzing

by Wei Lei (Nanyang Technological University)

Abstract :

In this talk, I will present a novel data-driven seed generation approach, named Skyfire, which leverages the knowledge in the vast amount of existing samples to generate well-distributed seed inputs for fuzzing programs that process highly-structured inputs. Skyfire takes as inputs a corpus and a grammar, and consists of two steps. The first step of Skyfire learns a probabilistic context-sensitive grammar (PCSG) to specify both syntax features and semantic rules, and then the second step leverages the learned PCSG to generate seed inputs. The experimental results on several XSLT and XML engines have demonstrated that Skyfire can generate well-distributed inputs and thus significantly improve the code coverage and the bug-finding capability of fuzzers.

Biography :

Dr. Wei Lei is a Research Fellow in Nanyang Technological University in Singapore, where he received his PhD in Cryptanalysis. Recently he works on software vulnerability discovery and building various tools for program analysis, and has reported dozens of vulnerabilities to Adobe and Apple, some coordinated via iDefense VCP and ZDI. His work also appears in top academic conferences, such as IEEE Security & Privacy and IACR Fast Software Encryption.

Title: 工业企业网络安全风险分析及对策建议

by 刘法旺(中国软件评测中心)

Abstract :


Biography :


Title: 工业信息安全形势严峻、安全保障工作刻不容缓

by 张格(国家工业信息安全发展研究中心)

Abstract :


Biography :


Title: 工业信息安全技术探讨

by 李博(北京航空航天大学大数据科学与脑机智能高精尖中心)

Abstract :


Biography :

李博,博士,北京航空航天大学计算机学院讲师,北京市大数据与脑机智能高精尖中心研究员,主要负责工控安全方向研究和应用产业化,在IEEE Transaction等国内外重要学术期刊会议发表论文60余篇,获授权发明专利20余项,承担或参与多项国家自然科学基金、863、973项目。




Contact Us

Shuai Ma


Jianxin Li



Xianhui Dong