The Third Big Data and Computational Intelligence Workshop (BDCI2016) will be held in Beijing, from July 29 to 31, 2016. This three-day workshop invites world-class speakers to introduce and share the recent achievements in data science, computational intelligence and their applications. The workshop is organized by Beijing Advanced Innovation Center for Big Data and Brain Computing(BDBC) with the sponsorship of Beihang University, Beijing Municipal Commission of Education, NSFC and National Basic Research Program (973 Program).
|8:00 ~ 8:30||Registration|
|8:30 ~ 8:40||Opening Remarks|
|8:40 ~ 9:40||The Interaction of Theory and Practice in Database Research||Ronald Fagin||Wenfei Fan|
|9:40 ~ 10:40||Querying Big Data: From Theory to Systems to Applications||Wenfei Fan|
|10:40 ~ 11:00||Tea Break|
|11:00 ~ 11:30||Structural Information Theory: Principles for Distinguishing Order From Disorder||Angsheng Li|
|11:30 ~ 13:30||Lunch|
|13:30 ~ 14:30||The challenges of publishing databases||Peter Buneman||Wenfei Fan|
|14:30 ~ 15:30||Managing data through the lens of an ontology||Maurizio Lenzerini|
|15:30 ~ 15:50||Tea Break|
|15:50 ~ 16:30||Dynamical Behavior in Complex Information System||Zhiming Zheng||Shuai Ma|
|16:30 ~ 17:00||Simba: Towards Building Interactive Big Data Analytics Systems||Feifei Li|
|17:00 ~ 17:30||Random Sampling on Big Data: Techniques and Applications||Ke Yi|
|17:30 ~ 18:00||Practice and Thoughts for Big Data Analytics||Baofeng Zhang|
|18:00 ~ 20:00||Dinner|
|9:00 ~ 9:40||Deep Learning Where to Go-The post deep learning age||Bo Zhang||Yongyi Mao|
|9:40 ~ 10:40||Transfer Learning for Reinforcement Learning||Qiang Yang|
|10:40 ~ 11:00||Tea Break|
|11:00 ~ 12:00||Chatting Robots by Deep Learning||Ming Li|
|12:00 ~ 13:30||Lunch|
|13:30 ~ 14:30||Towards Empathetic Human-Robot Interactions||Pascale Fung||Qiang Yang|
|14:30 ~ 15:30||Understanding Deep Learning and Neural Semantics||Xiaogang Wang|
|15:30 ~ 15:50||Tea Break|
|15:50 ~ 16:20||Big Learning with Bayesian Methods||Jun Zhu||Qiang Yang|
|16:20 ~ 16:50||DERF：Distinctive Efficient Robust Features from the Biological Modeling of the P Ganglion Cells||Yunhong Wang|
|16:50 ~ 19:00||Dinner|
|9:00 ~ 9:30||Building a Talking Parrot with MS Cognitive Service||Ming Zhou||Chunming Hu|
|9:30 ~ 10:00||From big data to intelligence: industrial data mining practices in Baidu||Zhiyong Shen|
|10:00 ~ 10:30||Building Interactive Analytics From Ground Up : Architecture, Tradeoffs and a lot of Work.||Ning Xie|
|10:30 ~ 10:50||Tea Break|
|10:50 ~ 11:20||ADAS based on deep learning, development and evaluation||Yinan Yu||Yunpeng Wang|
|11:20 ~ 11:50||The Technology and Application of YESWAY Telematics Big Data||Ziyue Li|
|11:50 ~ 13:30||Lunch|
|13:30 ~ 14:00||Data Driven Security—big data analytics and cyberspace security||Xiaosheng Tan||Jianxin Li|
|14:00 ~ 14:30||Applying Big Data and Learning Techniques to Problems in Security||Eric Hsu|
|14:30 ~ 15:00||Ring: Emerging Event Detection in Social Networks and Data Processing Platform||Jianxin Li|
|15:00 ~ 15:20||Tea break|
|15:20 ~ 15:50||Bigdata Driven Intelligent Software Development Techniques and Environment||Bing Xie||Xudong Liu|
|15:50 ~ 16:20||Distributed Multi-Layer IndexingScheme for Server-centric Cloud Storage System||Xiaofeng Gao|
|16:20 ~ 16:50||BrainQuest: Perception-Guided Brain Network Comparison||Lei Shi|
|16:50 ~ 17:20||The Representation and Embedding of Knowledge Bases||Richong Zhang|
|17:20 ~ 17:50||Spatio-temporal Crowdsourcing: A Novel Computation Paradigm in The Era of Sharing Economy||Yongxin Tong|
The interaction of theory and practice in database research
by Ronald Fagin （IBM Research – Almaden）
The speaker will talk about applying theory to practice, with a focus on two IBM case studies. In the first case study, the practitioner initiated the interaction. This interaction led to the following problem. Assume that there is a set of “voters” and a set of “candidates”, where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate’s scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Gödel Prize, the top prize for a paper in theoretical computer science. The interaction in the second case study was initiated by theoreticians, who wanted to lay the foundations for “data exchange”, in which data is converted from one format to another.
Although this problem may sound mundane, the issues that arise are fascinating, and this work made data exchange a new subfield, with special sessions in every major database conference.
This talk will be completely self-contained, and the speaker will derive morals from the case studies. The talk is aimed at both theoreticians and practitioners, to show them the mutual benefits of working together.
Ronald Fagin is an IBM Fellow at IBM Research – Almaden. IBM Fellow is IBM's highest technical honor. There are currently around 90 active IBM Fellows (out of around 400,000 IBM employees worldwide), and there have been only around 250 IBM Fellows in the over 50-year history of the program. Fagin received his B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics from the University of California at Berkeley. He is a Fellow of IEEE, ACM, and AAAS (American Association for the Advancement of Science). He has co-authored four papers that won Best Paper Awards and three papers that won Test-of-time Awards, all in major conferences. He was named Docteur Honoris Causa by the University of Paris. He won the IEEE Technical Achievement Award, IEEE W. Wallace McDowell Award, and ACM SIGMOD Edgar F. Codd Innovations Award (a life time achievement award in databases). He is a member of the US National Academy of Engineering and the American Academy of Arts and Sciences.
Querying Big Data: From Theory to Systems to Applications
by Wenfei Fan (BDBC, Beihang University; University of Edinburgh)
Querying big data is a departure from our familiar database techniques and even classical computational complexity theory. This talk advocates a resource-constrained paradigm, to query big data with bounded resources. Underlying the paradigm is a theory of bounded evaluation and a resource bounded approximation scheme. Following the paradigm, BEAS, a system for querying big relations with constrained resources, and GRAPE, a system for parallelizing sequential graph algorithms with guarantees on termination and correctness, are under development. As an application, we show how querying big data helps social media marketing, by proposing association rules for graphs.
Wenfei Fan is the Chair Professor of Web Data Management in the School of Informatics, University of Edinburgh, UK, and the director of the International Research Center on Big Data (RCBD), Beihang University, China. He is a Fellow of the Royal Society of Edinburgh, a Fellow of the ACM, a National Professor of the 1000-Talent Program and a Yangtze River Scholar, China. He received his PhD from the University of Pennsylvania, and his MS and BS from Peking University. He is a recipient of an ERC Advanced Fellowship in 2015, the ACM PODS Alberto O. Mendelzon Test-of-Time Award in 2010and 2015, the Roger Needham Award in 2008 (UK), the NSF Career Award in 2001 (USA), and several Best Paper Awards (VLDB 2010, ICDE 2007, and Computer Networks 2002). His current research interests include database theory and systems, in particular big data, data quality, data integration, distributed query processing, recommender systems, social networks and Web services.
Deep Learning Where to Go-The post deep learning age
by Bo Zhang(Tsinghua University)
Deep learning took us to a big data age which promotes the progress of artificial intelligence greatly. Due to its practicability and universality, some performances of image, speech and text processing based on deep learning are improved tremendously. Since deep learning is a black-box or brute-force learning method it only can discover the statistical correlations rather than causal relations, it is difficult to handle raw data effectively and its generalization ability is very low. In order to overcome these drawbacks, deep learning has to further learn from brain science. This is also the development direction of artificial intelligence in the post deep learning age.
Bo Zhang is now a professor of Computer Science and Technology Department of Tsinghua University, the fellow of Chinese Academy of Sciences. In 1958 he graduated from Automatic Control Department of Tsinghua University, and became a faculty member since then. From 1980/02 to 1982/02, he visited University of Illinois at Urbana-Champaign, USA as a scholar. In 2011, Hamburg University awarded him Honorary Doctor of Natural Sciences. He is now the member of Technical Advisory Board of Microsoft Research Asia. He won the Microsoft Research Outstanding Collaborator Award in 2016. He is engaged in the research on artificial intelligence, artificial neural networks, machine learning, and so on. And he also is engaged in the research applying technology that applies the theories mentioned above into pattern recognition, knowledge engineering, and robotics. In these fields, he has published over 200 papers and 5 monographs (chapters). Recently, he found a research group for cognitive computation and multimedia information processing. The group has got some important results inmachine learning, image and video analysis and retrieval.
The challenges of publishing databases
by Peter Buneman (University of Edinburgh)
Nearly all the information that is published on the Web uses some form of database technology. This information ranges from "born digital" scientific data -- such as images and sensor output -- to "human generated" data -- what we find in scholarly journals and on-line and reference works. Dealing with this information in any coherent way poses major challenges for computer science. For example, many organizations ask us to cite data in the same way that we traditionally cite scholarly articles, but how do we cite something that has been extracted by a query from a complex evolving database?
There are many other related challenges. Very few of these "publications" are static. How does one record the evolution of a database? When, as often happens, data is copied from one database to another, how does one record its provenance? There are also interesting social phenomena associated with the creation of these information artefacts. I shall attempt to give an account of the new database research issues associated with this form of publishing.
Peter Buneman is Professor of Database Systems in the School of Informatics at the University of Edinburgh. His work in computer science has focused mainly on databases and programming languages, specifically: database semantics, approximate information, query languages, types for databases, data integration, bioinformatics and semi-structured data. He has recently worked on issues associated with scientific databases such as data provenance, archiving and annotation. In addition he has made contributions to graph theory and to the mathematics of phylogeny. He has served on numerous program committees, editorial boards and working groups, and has been program chair for the leading database theory and systems conferences: ACM SIGMOD, ACM PODS, VLDB and ICDT. Recently he has initiated a project that has provided high-speed internet access to some of the most remote communities of Scotland. He is a fellow of the Royal Society, a fellow of the Royal Society of Edinburgh, a fellow of the ACM and the recipient of a Royal Society Wolfson Merit Award.
Managing data through the lens of an ontology
by Maurizio Lenzerini (Sapienza University)
Ontology-based data management is a new paradigm allowing managing data through the lens of a conceptualization of the domain of interest, called ontology. This new paradigm provides several interesting features, many of which have been already proved effective in managing complex information systems. On the other hand, several important issues remain open, and constitute stimulating challenges for the research community. In this talk we first provide an introduction to ontology-based data management, illustrating the main ideas and techniques for using an ontology to access the data layer of an information system, and then we discuss several important issues that are currently the subject of extensive investigations.
Maurizio Lenzerini (http://www.dis.uniroma1.it/~lenzerini) is a Professor of Data Management at the Department of Computer, Control, and Management Engineering of Sapienza University of Rome. He is conducting research on Database Theory, Data Management, Knowledge Representation and Automated Reasoning, and Ontology-based Data Access and Integration. He is the author of more than 300 publications on the above topics, which received more about 22.000 citations. He is the recipient of two IBM Faculty Awards, a Fellow of ECCAI (European Coordinating Committee for Artificial Intelligence), a Fellow of the ACM (Association for Computing Machinery), and a member of the Academia Europaea - The European Academy.
Transfer Learning for Reinforcement Learning
by Qiang Yang (Hong Kong University of Science and Technology)
Reinforcement learning is aiming at simulating human intelligent planning activities. However, the limitation of state space is the major bottleneck of achieving this objective. The development and the extensive application of deep learning promotes the organic combination of reinforcement learning with deep learning. Based on this development, transfer learning can be introduced to transfer the model, which is learnt by reinforcement learning from a large-scale dataset, based on a relatively small data set. This technique could deliver a personalized model for individual demands. In this talk, the basic principle of reinforced transfer learning and examples of application of the dialogue system will be discussed.
Qiang Yang is the Chair Professor, Department Head of CSE, HKUST in Hong Kong. He was the founding head of Noah’s Ark Lab (2012-2014) and the Program Chair of IJCAI 2015. He is the first AAAI Fellow among Chinese Scientists Worldwide. He is also the IEEE Fellow, AAAS Fellow, IAPR Fellow and an ACM Distinguished Scientist. His research interests are artificial intelligence and big data. He is the founding Editor in Chief of the ACM Transactions on Intelligent Systems and Technology (ACM TIST) and IEEE Transactions on Big Data . He is also the Vice Chair of Chinese AI Society and Trustee of AAAI /IJCAI.
Chatting Robots by Deep Learning
by Ming Li (University of Waterloo, and RSVP Technologies Inc.)
Humans are different from the chimps because we talk. Making computers talk like a human is a dream of artificial intelligence. However, it is way more difficult than making it play Weiqi. A conversation needs to be natural, sensible, and robust. Template-based methods and keyword-based methods, as used in Echo or Siri, are either too restrictive or error-prone. We have partially resolved these problems and designed chatbotDoudou by a new deep learning architecture and a massive amount of data. Doudou significantly improves the state of the art, chatting naturally, wisely, knowledgeably, and robustly.
Joint work with Kun Xiong, Anqi Cui, and Zefeng Zhang.
Ming Li is a Canada Research Chair in Bioinformatics and a University Professor at the University of Waterloo. He is a fellow of Royal Society of Canada, ACM, and IEEE. He is a recipient of Canada's E.W.R. Steacie Fellowship Award in 1996, the 2001 Killam Fellowship and the 2010's Killam Prize. Together with Paul Vitanyi they have pioneered the applications of Kolmogorov complexity and co-authored the book "An introduction to Kolmogorov complexity and its applications". His research interests recently include bioinformatics, natural language processing, deep learning, and information distance.
Towards Empathetic Human-Robot Interactions
by Pascale Fung (Hong Kong University of Science and Technology)
“Sorry I didn’t hear you” maybe the first empathetic utterance by acommercial machine. Since the late1990s when speech companies beganproviding their customer-service software to other numerous companies, which was programmed to use different phrases, people have gotten usedto speaking to machines. As people interact more often with voice andgesture controlled machines, they expect the machines to recognize different emotions, and understand other high level communicationfeatures such as humor, sarcasm and intention. In order to make suchcommunication possible, the machines need an empathy module in them,which is a software system that can extract emotions from human speech and behavior and can accordingly decide the correct response of therobot. Although research on empathetic robots is still in the primarystage, current methods involve using signal processing techniques,sentiment analysis and machine learning algorithms to make robots thatcan ’understand’ human emotion. Other aspects of human-robot interaction includefacial expression and gesture recognition; robot movement to convey emotion and intent.We propose Zara the Supergirl as aprototype system. It is a software based virtual android, with an animated cartoon character to present itself on the screen. Along the way it willget ’smarter’ and more empathetic, by having machine learningalgorithms, and gathering more data and learning from it. In this talk, I will give an overview of multi-channel recognition and expression of emotion and intent. I will present our work so far in the areas of deep learning of emotion and sentiment recognition, as well as humor recognition. I hope to explore thefuture direction of android development and how it can help improvepeople's lives.
Pascale Fung is a Professor of Electronic and Computer Engineering at Hong Kong University of Science and Technology. She was elected Fellow of the Institute of Electrical and Electronic Engineers for her contributions to human-machine interactions. She is one of the founding faculty members of the Human Language Technology Center (HLTC) at HKUST, Director of InterACT@HKUST, and the founding chair of the Women Faculty Association at HKUST.
Understanding Deep Learning and Neural Semantics
by Xiaogang Wang (Chinese University of Hong Kong)
Deep learning has achieved great success in computer vision. Many people believe that the success is due to employing a huge number of parameters to fit big training data. In this talk, I will show that neuron responses of deep models have clear semantic interpretation, which is supported by our research on multiple fields of face recognition, object tracking, human pose estimation, and crowd video analysis. In particular, the responses of neurons in the top layers have sparseness and strong selectiveness object classes, attributes and identities. Sparseness and selectiveness are strongly correlated. Such selectiveness is naturally obtained through large scale training without adding extra regularization during the training process. By understanding neural semantics, we are inspired to develop new network architectures and training strategies and they effectively improve a broad range of applications in face recognition, face detection, compressing neural networks, object tracking, learned structured feature representation in human pose estimation, and effectively learning dynamic feature representations of different semantic units in video understanding.
Xiaogang Wang received his Bachelor degree in Electronic Engineering and Information Science from the Special Class of Gifted Young at the University of Science and Technology of China in 2001, M. Phil. degree in Information Engineering from the Chinese University of Hong Kong in 2004, and PhD degree in Computer Science from Massachusetts Institute of Technology in 2009. He is an associate professor in the Department of Electronic Engineering at the Chinese University of Hong Kong since August 2009. He received PAMI Young Research Award Honorable Mention in 2016, the Outstanding Young Researcher in Automatic Human Behaviour Analysis Award in 2011, Hong Kong RGC Early Career Award in 2012, and Young Researcher Award of the Chinese University of Hong Kong. He is the associate editor of the Image and Visual Computing Journal, Computer Vision and Image Understanding, IEEE Transactions on Circuit Systems and Video Technology. He was the area chair of ICCV 2011, ICCV 2015, ECCV 2014, ECCV 2016, ACCV 2014, and ACCV 2015. His research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition.
Structural Information Theory: Principles for Distinguishing Order from Disorder
by Angsheng Li (Institute of Software, Chinese Academy of Sciences)
It has been a great challenge in computer science and information science to define the information embedded in physical systems that supports the distinction of the order from disorder. In 1953, Shannon himself asked the question to establish a structural theory of information that supports communication network analysis. The challenge had not been satisfactorily resolved, until recently we establish the first such a theory, structural information theory. I will introduce our structural information theory. Our theory supports communication network analysis, provides the principles for distinguishing the order from disorder in physical systems and for analyzing big data, provides the foundation for a theory of dynamical complexity of networks. Our theory establishes the principle for a next-generation search engine, leading to new directions in the network algorithm area. I will also introduce some follow-up directions of our theory.
Angsheng Li is a research professor of Institute of Software, Chinese Academy of Sciences. He was born in 1964. He got bachelor in Mathematics in Yunnan Normal University in 1984, and ph D in 1993 in Institute of Software, Chinese Academy of Sciences. He has been working for the Institute of Software, Chinese Academy of Sciences since 1993 after he finished his ph D. From 1998 to 2002, he was a visitor and research fellow in the University of Leeds, UK, working with Professor Barry Cooper (an academic descendant of Alan Turing) in Computability Theory. In 2003, he was awarded the Distinguished Young Investigator award of the National Natural Science Foundation of China. In 2008, he was selected by the Hundred Talent Program of Chinese Academy of Sciences. From 2008 to 2009, he was a visiting scientist in Computer Science Department, Cornell University, US, working with Professor Juris Hartmanis (the founder of Computational Complexity Theory). His research areas include Computability Theory, Computational Theory, Network Theory.
Dynamical Behavior in Complex Information System
by Zhiming Zheng (Beihang University)
A complex information system is a system composed of many interacting parts of information, e.g. complex networks, stock markets and human society. The study of complex information system is based on techniques and ideas from a wide range of areas: mathematics, physics, and computer simulation. The speaker will talk about the related works of his research group in aspects of celerity, extensiveness, accuracy, stability and safety in complex information systems in the viewpoint of information science. Particularly, the examples about the character of celerity and extensiveness will be introduced: the first example is seeking the shortest path in complex networks; the second example is the explosive percolation in random graphs.
Zhiming Zheng is the professor of school of mathematics and system sciences at BeihangUniversity, director of Key Laboratory of Mathematics, Information and Behavioral Semantics, Ministry of Education. He received his PhD, MS and BS from Peking University. His research interests include complex information system, dynamical cryptography and dynamical system. He currently serves as chief editor of the Journal of Mathematics in Computer Science and the Journal of Mathematical Biosciences and Engineering. He wonthe first prize of State Technological Invention Award and the HeliangHeli Prize.
Simba: Towards Building Interactive Big Data Analytics Systems
by Feifei Li(University of Utah)
Interactive queries over large data becomes a critical requirement in many applications. As a result, it is critical to provide fast, scalable, and high-throughput query processing and analytics for numerous applications. We will present the Simba system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial and multimedia data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the Data Frame API. It introduces the concept and construction of indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other big data analytics system. Through its SQL and Data Frame API, Simba provides interactive analytics over big data, but when data grows too big and/or computation becomes too expensive, we will achieve interactive analytics through online analytics.
Feifei Li is currently an associate professor at the School of Computing, University of Utah. He obtained his Bachelor's degree from Nanyang Technological University (transferred from Tsinghua University) in 2001 and PhD from Boston University in 2007. His research focuses on improving the scalability, efficiency, and effectiveness of database and big data systems. He also works on data security problems in these systems. He was a recipient for an NSF career award in 2011, two HP IRP awards in 2011 and 2012 respectively, a Google App Engine award in 2013, an IEEE ICDE best paper award in 2004, the IEEE ICDE 10+ Years Most Influential Paper Award in 2014, a Google Faculty award in 2015, SIGMOD Best Demonstration Award in SIGMOD 2015, and the SIGMOD 2016 Best Paper Award. He was the demo PC co-chair for VLDB 2014, the general co-chair for SIGMOD 2014, a PC area chair for ICDE 2014 and SIGMOD 2015, and currently serves as an associate editor for IEEE TKDE.
Random Sampling on Big Data: Techniques and Applications
by Ke Yi (Hong Kong University of Science and Technology)
Random sampling is a powerful tool for big data analytics. It can be used whenever complete accuracy is not required, while offering order-of-magnitude improvement in query efficiency. Although random sampling has been well studied in the statistics literature, new big data systems and applications call for new sampling algorithms that conform to the requirements and constraints posed by these systems. In this talk, I will discuss some of our recent progress on using random sampling for tackling big data problems, including sampling over streaming data, spatial data, relational and graph-structured data.
Ke Yi is an Associate Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his Bachelor's degree from Tsinghua University (2001) and PhD from Duke University (2006), both in computer science. Ke's research spans theoretical computer science and database systems. He has received a Google Faculty Research Award (2010), the Young Investigator Research Award from HKUST (2012), a SIGMOD Best Demonstration Award (2015), and the SIGMOD Best Paper Award (2016). He currently serves as an Associate Editor of ACM Transactions on Database Systems and IEEE Transactions on Knowledge and Data Engineering.
Practice and Thoughts for Big Data Analytics
by Baofeng Zhang（Noah’s Ark LAB at Huawei）
From big data to machine learning, from machine learning to general AI,lots of concepts continuously emerge. But are there any really breakthroughs of the current technologies and do they really meet the requirements? What are the real value and challenges? It’s hard for anyone to give a conclusion orjudgment right now. This talk will share some real cases for discussion.
Baofeng Zhang is founder and vice director of Noah’s Ark LAB at Huawei,LAB’s vision is “Building Better Connected World with Data Mining and Artificial Intelligence Technologies”.
Baofeng Zhang has rich experiences with R&D activities and management in ICT industry(e.g., software design and development, requirement analysis, system architecture design, standard and research) and also joined and led a wide range of research/development projects(e.g. campus calling debit card in circuit-based switch, Intelligent LAN switch/Edge Routers, Telco Big Data and Intelligence Terminals …)
He is also very active in society, is member of core expert group in National "Important Special Projects of Crucial Electronic Devices, Hi-end General Chips and Basic Software Products" Program and the member of CCF (China Computer Federation) Big Data Technical Committee. And he was head of delegations to numerous national standard development eventsand active participant of numerous of international Standard Development e.g. ITU-T and ETSI.
Big Learning with Bayesian Methods
by Jun Zhu (Tsinghua University)
Bayesian methods represent one important school of statistical methods for learning, inference and decision making. At the core is Bayes' theorem, which has been developed for more than 250 years. However, in the Big Data era, many challenges need to be addressed, ranging from theory, algorithm, and applications. In this talk, I will introduce some recent developments on generalizing Bayes' theorem to incorporate rich side information, which can be the large-margin property we like to impose on the model distribution, or the domain knowledge collected from experts or the crowds, and scalable online learning and distributed inference algorithms. The generic framework to do such tasks is called regularized Bayesian inference (RegBayes). I will introduce the basic ideas of RegBayes as well as several concrete examples with scalable inference algorithms and deep models.
Jun Zhu is an Associate Professor at the Department of Computer Science and Technology in Tsinghua University and an Adjunct Faculty at the Machine Learning Department in Carnegie Mellon University. He received his Ph.D. in Computer Science from Tsinghua in 2009. Before joining Tsinghua in 2011, he did post-doctoral research at the Machine Learning Department in Carnegie Mellon University. His current work involves both the foundations of machine learning and the applications in social network analysis, data mining, and multi-media data analysis.
Prof. Zhu has published over 80 peer-reviewed papers in the prestigious conferences and journals. He is an associate editor for IEEE Trans. on PAMI. He served as area chair/senior PC for ICML (2014, 2015, 2016), NIPS (2013, 2015), IJCAI (2013, 2015), UAI (2014, 2015, 2016), and AAAI (2016, 2017). He was a local co-chair of ICML 2014. He is a recipient of Microsoft Fellowship (2007), CCF Distinguished PhD Thesis Award (2009), IEEE Intelligent Systems "AI's 10 to Watch" Award (2013), NSFC Excellent Young Scholar Award (2013), CCF Young Scientist Award (2013), and CVIC SE Talents Award (2015). His work is supported by the "National Youth top-notch Talent Support Program" and "Tsinghua 221 Basic Research Plan for Young Talents".
DERF：Distinctive Efficient Robust Features from the Biological Modeling of the P Ganglion Cells
by Yunhong Wang (Beihang University)
Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. A new local image descriptor, termed Distinctive Efficient Robust Features, or DERF, is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells (P-GCs) in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function Difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG (TF-DoG) which leads to much better performance. Extensive experiments conducted in the image matching task on the Multiview Stereo Correspondence Data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute.
Yunhong Wang is now a professor of School of Computer Science and Engineering, Beihang University, Beijing, where she is also the Director of Laboratory of Intelligent Recognition and Image Processing, Beijing Key Laboratory of Digital Media. She received the B.S. degree from Northwestern Polytechnical University, Xi¡’an, China, in 1989, and the M.S. and Ph.D. degrees from Nanjing University of Science and Technology, Nanjing, China, in 1995 and 1998, respectively. She was with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China, from 1998 to 2004. Her current research interests include biometrics, pattern recognition, computer vision, data fusion, and image processing. In these fields, she has published over 200 papers and cited more than 9000 times in google scholar.
Building a Talking Parrot with MS Cognitive Service
by Ming Zhou (Microsoft Research Asia)
Our goal is to enable any computer system and device with the capability of chatting. Based on a set of technologies in big data, knowledge base, natural language understanding and deep learning, we have developed a chat-bot solution composed of chit-chat, Q&A, dialog systems and deeply contributed to a set of chat-bot products such as XiaoIce (launched in China), Rinna (launched in Japan) and TAY (to be launched in US).
Polly is a talking parrot with capabilities of chatting, speech and vision. Powered by the state-of-the-art chat-bot technology as well as speech and vision APIs of MS Recognition Service, it is enabled to interact with you in the style of a parrot with simulated voice, gesture, chatting and facial expression. Polly can be used as a virtual guider for various life and work scenarios. With Polly, we would also like to explore the customization technology of a chat-bot where different knowledge components can be easily pluggable.
Ming Zhou, Principal researcher at Microsoft Research Asia, director of Chinese Information Technology Committee of China Computer Federation, executive member of China Chinese Information Processing Society, co-director of Tsinghua-MS Joint Lab on Media and Networking, PhD advisor of multiple universities including Harbin Institute of Technology, Tianjin University, Nankai University and Shandong University. He obtained his Bachler degree from Chongqing University in 1985 and his PhD. at Harbin Institute of Technology in 1991, both in CS. He was a post-doc at Tsinghua during 1991-1993 and then an associate professor until 1999. He led a team to develop Chinese-Japanese machine translation product in Kodensha Ltd. In Japan during 1996-1999. He is the inventor of the first Chinese-English machine translation system (CEMT-1), the famous Chinese-Japanese machine translation product (J-Beijing). He joined Microsoft Research Asia in 1999 and became the manager of its natural language group in 2001. Under his leadership, this group has developed many famous NLP technologies including MS IME, MS Couplets, Engkoo/Bing Dictionary, Chinese-English machine translation for MS Translator and Skype Translator. In recent years, his group worked with related product groups and developed popular chatbots such as Xiaoice, Rinna and Tay with totally 40 million users.
From big data to intelligence: industrial data mining practices in Baidu
by Zhiyong Shen (Big Data Lab (BDL), Baidu Research)
Internet companies like Baidu are born with big data, where big data technologies as well as infrastructures are leveraged to improve product performance. We are trying to ship these advantages on big data to some traditional industrial areas such as tourism, healthcare, finance, retail and so on. I'll give a brief introduction of these projects.
Dr. Zhiyong Shen, who obtained a PhD in State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences in 2009, is currently employed as a senior data scientist in Big Data Lab (BDL), Baidu Research. Before he joined Baidu in 2013, he worked as a research scientist in the Hewlett-Packard Labs, China. He received his bachelor in statistics from Department of Probabilities and Statistics, School of Mathematics Sciences, Peking University in 2003.
Building Interactive Analytics From Ground Up : Architecture, Tradeoffs and a lot of Work
by Ning Xie(SmartSCT)
Businesses need analytics to help them make better decisions on their data. Ideally, analytics would be interactive, visual, exploratory, and most importantly, so easy that a business user can use. However, in recent times analytics had gone the opposite with the emergence of Big Data: complex algorithmic programming and dealing with heterogeneous data sources require expensive data scientists outside of the reach of many businesses.
In this talk, I will discuss how we build a commercial-quality interactive analytics platform from the ground-up to ease the complexities of Big Data. I will share our architectural decisions and tradeoff considerations, from the service separation, to the connector data abstraction, rolling up to the engine planner/executor design and the front-end visualization.
Ning Xie is the co-founder and CTO of SmartSCT, a startup focused on empowering the world with interactive data intelligence. Prior to founding SmartSCT, he spent ten years at Teradata leading its database engine development teams in the US and later heading up its China R&D Center in Beijing. He received his BS from Santa Clara University and MS from Carnegie Mellon University. His technology interests include analytics, distributed systems, database, and performance optimization.
ADAS based on deep learning, development and evaluation
by Yinan Yu (Horizon Robotics)
In the last decade, the deep learning has developed rapidly and gotten more and more attention. The year 2016 is a historical timing node that the function of automatic driving has developed from the perception to decision-making on account of the deep learning, which marks the beginning of a new era in Artificial Intelligence. Whereas, the deep learning used in the system of automatic driving is still facing with challenges, such as algorithm, computing platform and system integration.
Dr. Yinan Yu, the Senior Engineer of Horizon Robotics, majors in computer vision, machine learning and autopilot. He served at Baidu IDL as a senior researcher before joining Horizon Robotics. He graduated from the Institute of Automation of the Chinese Academy of Sciences in 2012 and joined Baidu after graduation. He had received the highest honor in Baidu for twice. He completed the united training by Baidu and Tsinghua University as a post-doctor in 2015. Dr. Yinan Yu had gotten a number of international honors during and after his study, such as the champions of PASCAL VOC in 2010 and 2011, the third of ImageNet in 2013.
Techniques and Applications of the Intelligent Internet of Vehicles
by Ziyue Li (Beijing YESWAY Information Technology Co, Ltd)
With the advent of Internet and BigDatadeveloped rapidly, the Telematics Data in both personal and industry plays an important role.Beijing YESWAY Information Technology Co,Ltd., as a Telematics Service Provider for nearly 10 years, via the data acquisition equipment and Yesway Telematics Big Data Platform,to provide the telematics big data service for personal and industries, which not only makes the great technological innovation in the respect of data collection, processing and storage, but also gives a significant contribution in data integration for the Transportation,Insurance, and other industries.
Ziyue Li is the Big Data Director of Beijing YESWAY Information Technology Co, Ltd. He graduated from Texas Southern University in Transportation Management and Planning in 2014. His research interest is big data analysis for transportation, and takes part in many traffic design programs. Now he is working on the Telematics Big Data analysis and application for the personal and industries. Through the data processing and modeling, he set up the good relationship of data cooperation with Beijing Municipal Commission of Transport to provide the public travel service.
Data Driven Security——big data analytics and cyberspace security
by Xiaosheng Tan ( Qihoo 360)
Digitalized, interconnected, our life deeply relied on internet, the attacking surface is larger than ever, we turned to detection” and “response” instead of “prevention” when we failed to protect, the signature based anti-malware, the rule based IPS/IDS does not work effectively any more, we saw the rising of User and Entity Behavior Analytics, we saw big data based anormaly detection, but does it work? What’s the challenge?
As Chief Security Officer of Qihoo 360, Xiaosheng Tan also serves as executive member of the council and Deputy Secretary General of China Computer Foundation(CCF), honored as a high-end leading figure of Zhongguancun. He has successively worked at Xi’an Jiaotong University, Founder group of Peking university, Shenzhen Modern Computer Co., Ltd. and Shenzhen Horson Technology Co., Ltd. has worked on the R&D of anti-virus systems for DOS, disc anti-copy softwrae, Chinese operating systems, and large management information systems. He entered the Internet industry in 2003 and has since worked as director of R&D at 3721 what was soon acquired by Yahoo, then CTO of Yahoo! China, and CTO and COO at MySpace China. Xiaosheng joined Qihoo 360 in 2009 as VP of technology and started working concurrently as Chief Privacy Officer at 360 in March 2012. He is an expert in cloud computing, information security, and search technology. Currently, part-time professor and director at Chongqing University of Posts and Telecommunications, and entrepreneur supervisor at Beijing University of Posts and Telecommunications.
Applying Big Data and Learning Techniques to Problems in Security
by Eric Hsu (KuangEn)
The engineering challenges presented by ever-growing data domains are well-appreciated, and have promoted the development of ever-improving algorithmic and implementational techniques for processing big data. Meanwhile, when employing such data in machine learning, the traditional classification-oriented perspective has been that more data points make for an easier learning task. How does this viewpoint change for a domain like security, where we want to isolate specific incidents like network intrusion (and thus are not in a purely unsupervised learning domain) and must prevent novel attacks in real time (and thus lack the labeled training examples characterizing a supervised domain)? This talk will survey existing artificial intelligence approaches to securing large-scale systems, emphasizing an “anomaly detection” perspective where finding ways to model normalcy can be just as useful as modelling the malicious behaviours we wish to prevent.
Before joining Kuang En to apply artificial intelligence techniques to the security domain, Eric Hsu has performed basic research in planning and scheduling at SRI International, and also built commercial machine learning systems for digital advertising startups that were later acquired by Yahoo and AOL. After receiving an undergraduate degree at Harvard University, he completed his graduate studies at Stanford University and University of Toronto. His general research interests involve the combination of discrete and continuous optimization techniques to solve applied problems in artificial intelligence.
Ring: Emerging Event Detection in Social Networks and Data Processing Platform
by Jianxin Li ( Beihang University)
In this talk, we demonstrate RING, a real-time emerging event monitoring system over microblog text streams. RING integrates our efforts on emerging event monitoring research and system research. Ring has an application for event detection from streaming Microblog text, and (1) RING is able to detect emerging events at an earlier stage compared to the existing methods, (2) RING is among the first to discover emerging events correlations in a streaming fashion, (3) RING is able to reveal event evolutions at different time scales from minutes to months. We have also built a Ring data processing system, (1) Ring supports some typical open source systems including Kafka, HBase, Alluxio, Spark and Elastic Search etc. Moreover, some incremental algorithms for NLP, Anomaly Subgraph Detection, Feature extraction, etc. are provided. (2) Ring implements an incremental graph processing system with SSP (Stale Synchronous Parallel) protocol. (3) Ring has an efficient Parameter Server and machine learning system based on distributed shared memory (DSM) framework, and an ongoing participated system HotBox with CMU, and it is a Feature Engineering Engine for Machine learning at scale.
Jianxin Li is a professor at the School of Computer Science and Engineering, Beihang University, a senior member of CCF, and a member of IEEE and ACM. He received his BS and Ph.D. degree from Beihang University, China in 2001 and 2008, respectively. He was a visiting scholar at the machine learning department of Carnegie Mellon University (CMU) in 2015, a visiting researcher of Microsoft Research Asia (MSRA) in 2011, a short-term visiting researcher of Rutherford Appleton Laboratory (RAL), UK in 2008. His current research interests include data analysis and processing, virtualization and cloud computing.
Bigdata Driven Intelligent Software Development Techniques and Environment
by Bing Xie( Peking University)
With the development of open source software, there are a variety of massive software resource, including codes, documents, emails, bug reports, comments, Q&As, forums, blogs. These resources are distributed on Internet and become software big data. The features of software big data are massive volumes，a wide range of variety, rich semantics，close relation. Based on the knowledge and intelligence of software big data, it will provide new technologies and platforms for intelligent software development. This talk will analyze state of art of software big data, discuss the changes of software developing techniques and environments. The research will be conducted from two aspects of fundamental technologies: the data accumulation/fusion and the knowledge acquisition/usage, and then present the intelligent software development environment from the whole perspective. This talk will introduce the main research work and targets of our project sponsored by one national key research and development program.
Bing Xie was born in 1970. He received the Ph.D degree in Computer Science from National University of Defense Technology in 1998. He is a professor and doctoral supervisor at Peking University now. His research interests include software engineering, formal methods , etc. As a project leader, he was in charge of 863 Program, National Key Technology R&D Program, National Science Fund for Distinguished Young Scholars Program and also participated in the 973 Program, 863 Program and Key Program supported by Ministry of Science and Technology of China. He was awarded the second prize of the National Science and Technology Progress Award in 2006 and State Technological Invention Award in 2015. He has made contributions to the research and development of software engineering and domestically-made software tools as well as their industrialization. He has pioneered the applications of software industry, information service industry, defense industry and IC design. He has published over 70 papers in the conferences and journals in the fields of software engineering and procedural language, such as FSE, POPL, etc. He is also the fellow of China Software Industry Association, China Computer Federation as well as the editorial board member of Chinese Journal of Electronics.
Distributed Multi-Layer Indexing Scheme for Server-centric Cloud Storage System
by Xiaofeng Gao( Shanghai Jiao Tong University)
Cloud storage system poses new challenges to the community to support efficient concurrent querying tasks for various data-intensive applications, where indices always hold important positions. In this talk, we explore a practical method to construct a two-layer indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We first propose RT-HCN, an indexing scheme integrating.
R-tree based indexing structure and HCN-based routing protocol. RT-HCN organizes storage andcompute nodes into an HCN overlay, one of the newly proposed sever-centric data center topologies. Based on the properties of HCN, we design a specific index mapping technique to maintain layered global indices and corresponding query processing algorithms to support efficient query tasks. Then, we expand the idea of RT-HCN onto another server-centric data center topology DCell, discovering a potential generalized and feasible way of deploying two-layer indexing schemes on other server-centric networks. Furthermore, we prove theoretically that RT-HCN is both space-efficient and query-efficient, by which each node actually maintains a tolerable number of global indices while high concurrent queries can be processed within accepted overhead. We finallyconduct targeted experiments on Amazon’s EC2 platforms, comparing our design with RT-CAN, a similar indexing scheme for traditional P2P network. The results validate the query efficiency, especially the speedup of point query ofRT-HCN, depicting its potential applicability in future data centers.
XiaofengGao received the B.S. degree in information and computational science from Nankai University, China, in 2004; the M.S. degree in operations research and control theory from Tsinghua University, China, in 2006; and the Ph.D. degree in computer science from The University of Texas at Dallas, USA, in 2010. She is currently an Associate Professor with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. Her research interests include data engineering, wireless communications, and combinatorial optimizations. She has published more than 100 peer-reviewed papers in the related area, including well-archived international journals such as IEEE TC, IEEE TKDE, IEEE TPDS, and also in well-known conference proceedings such as INFOCOM, SIGKDD, etc. She has served on the editorial board of Discrete Mathematics, Algorithms and Applications, and as the PCs and peer reviewers for a number of international conferences and journals.
Brain Quest: Perception-Guided Brain Network Comparison
by Lei Shi (Institute of Software, Chinese Academy of Science)
Why are some people more creative than others? How do human brain networks evolve over time? A key stepping stone to both mysteries and many more is to compare weighted brain networks. In contrast to networks arising from other application domains, the brain network exhibits its own characteristics (e.g., high density, indistinguishability), which makes any off-the-shelf data mining algorithm as well as visualization tool sub-optimal or even mis-leading. In this talk, we propose a shift from the current mining-then-visualization paradigm, to jointly model these two core building blocks (i.e., mining and visualization) for brain network comparisons. The key idea is to integrate the human perception constraint into the mining block earlier so as to guide the analysis process. We formulate this as a multi-objective feature selection problem; and propose and integrated framework, Brain Quest, to solve it. This talk will also discuss the integrated visual analytics pipeline that orchestrates computational models with comprehensive data visualizations on human brain networks. User studies, quantitative experiments, and two real-world case studies to evaluate the proposed method will be mentioned at last.
Lei Shi is an associate research professor in the State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences. Before that, he was a research staff member and research manager at IBM Research - China. He holds B.S. (2003), M.S. (2006) and Ph.D. (2008) degrees from Department of Computer Science and Technology, Tsinghua University. His current research interests are Visual Analytics and Data Mining, with more than 60 papers published in top-tier venues, such as IEEE TVCG, TKDE, TC, VIS, ICDE, ICDM, Infocom, ACM Sigcomm and CSCW. He is the recipient of IBM Research Division Award on "Visual Analytics" and the IEEE VAST Challenge Award twice in 2010 and 2012. He has organized several workshops in combining visual analytics and data mining, in ICDM, CIKM and ACM BCB. He is an IEEE senior member.
The Representation and Embedding of Knowledge Bases.
by Richong Zhang ( Beihang University)
The models developed to date for knowledge base embedding are all based on the assumption that the relations contained in knowledge bases are binary. For the training and testing of these embedding models, multi-fold (or n-ary) relational data are converted to triples (e.g., in FB15K dataset) and interpreted as instances of binary relations. However, this representation approach and data conversion is irreversible. We advocate a novel modeling framework, which models multi-fold relations directly using this canonical representation. Using this framework, the existing TransH model is generalized to a new model, m-TransH. The detailed framework setups and implementation algorithms will be delivered in this talk.
Richong Zhang received his BS and MAS from Jilin University, China in 2001 and 2004, respectively. He received his PhD form the School of Information Technology and Engineering, University of Ottawa in 2011. He is currently an associate professor in the School of Computer Science and Engineering, Beihang University. His research interests include machine learning, data mining and their applications in knowledge graph, NLP and crowdsourcing.
Spatio-temporal Crowdsourcing: A Novel Computation Paradigm in The Era of Sharing Economy
by Yongxin Tong ( Beihang University)
Crowdsourcing is a new computation paradigm where humans are enrolled actively to participate into the procedure of computing, especially for the tasks that are intrinsically easier for human than for computers. Recently, the rapid development of mobile Internet and Online-To-Offline (O2O) services has led to the boom of all types of spatio-temporal crowdsourcing platforms, where each user is treated as a mobile computing unit that can be activated and guided for certain tasks. In this talk, I will first briefly review the history of crowdsourcing and introduce some representative industrial applications of spatio-temporal crowdsourcing in the era of sharing economy, such as Uber, Didi Taxi, Gigwalk, etc. Then, I will discuss the core issues related to spatio-temporal crowdsourcing and introduce our recent research of real-time online task assignment. Finally, I also highlight some interesting future works regarding spatio-temporal crowdsourcing.
Dr. Yongxin Tong received a Ph.D. degree in Computer Science from the Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST). He is currently an associate professor in the School of Computer Science and Engineering, Beihang University. Before that, he served as a research assistant professor and a postdoctoral fellow at HKUST. His research interests include crowdsourcing, uncertain data processing and social network analysis. He has published more than 20 papers in highly refereed database and data mining journals and conferences such as SIGMOD, SIGKKDD, VLDB, ICDE, TKDE, and TOIS. Dr. Tong was awarded the Microsoft Research Asia Fellowship 2012, and received the Excellent Demonstration Award and the Best Paper Award conferred by the VLDB 2014 and WAIM 2016 conferences, respectively. He has also been a reviewer for leading academic journals, such as TKDE, and has served in the program committees of prestigious international conferences, such as IJCAI 2015.
Registration Deadline：27th July, 2016