今年4月にマカオで主催されたIEEEのデータエンジニアリングのカンファレンス（International Conference on Data Engingeering 2019、通称ICDE）について、参加したDMMのAI部メンバーから、会議のハイライトをいくつかご紹介します。
This April, the IEEE's annual conference for data engineering was held in Macau. This is considered to be an A-level conference in the broader academic community that relates to data science topics. It also includes a major industry presence – especially from leading Chinese tech companies, such as Alibaba, Baidu, and Huawei (perhaps partially due to the location this year) – with some specific industry sessions and dozens of collaborative published research. My goal of attending this conference was to take notes on paper presentations and keynotes that may be relevant to our work on machine learning and AI for web services at DMM. In the process, I also observed some larger themes or trends within this domain of the academic community that I think are worth sharing. These trends may be useful to understand which techniques and technologies will become commonplace and dominate in the coming years for the industry, helping us in DMM's AI team to know know what we should be trying in our projects and also help us narrow down what kind of candidates we should try to recruit. Below, I will share some highlights from the conference organized by these major themes:
Adding an attention layer
Across a wide variety of topics and baseline architectures, adding an attention mechanism to some existing well-known deep learning approach to beat the state of the art seemed to be an easily publishable topic at the conference. An attention mechanism is basically an additional vector placed in between existing layers of a neural network (i.e. more generally between the first encoding and initialization of decoding) that can carry forward the "context" of previous layers, helping the subsequent layer "pay more attention" to specific parts of the input. Here is an introduction to one of attention's most popular uses: sequence-to-sequence modeling for machine translation.
At the conference, there was a paper on adding attention to Long short-term memory recurrent neural networks (LSTMs) for recommendations using clickstream data . Additionally, another presenter took a similar approach (LSTM + matrix factorization + attention layer) on multiple time series data measured during the manufacturing of machinery . There was a paper on adding attention to Doc2Vec to model smartphone users' multi-tasking behavior and more correctly predict the next app they will use . And in a workshop session there were several papers introducing an algorithm involving attention, such as a "co-attention" neural network used to pick up on context in online user reviews .
As for how this collection of research would apply to a more "real world" case in web services, we could consider it as an augmentation to many of the approaches the AI team at DMM is taking to solve issues in detecting fraud, making product recommendations, and solving customer service issues. Perhaps we could suggest the following development cycle which is generally in line with how much of the presented research was conducted:
- First, try a well-known, easy-to-implement approach (CNN, DNN, RNN) on the research problem
- When measuring performance, perform an additional qualitative evaluation on "what the model is getting wrong or getting confused by or should pay more attention to"
- Explore whether adding an attention layer would help with the identified qualitative issues
- If attention improves upon the baseline model, develop it further (attention mechanisms are not as "out-of-the-box" as they could be) for production
Graphs at scale
It was obvious in the conference that many large tech companies (especially those from China) are seeing performance improvements from using graph-based approaches for deep learning, information retrieval, and even attribute extraction problems. Highlights about those approaches are discussed later in this post. First, this shift has obviously created a need for the same companies – who are working with massive amounts of customer and product data – to process graph data more efficiently.
To me, much of the research in this area felt similar to the improvements that query processing on top of the map-reduce framework (such as Hive, Pig, Dask) had for more standard columnar data structures. The goal of this research in data engineering is typically to find a more efficient way to traverse or query a graph based on some desired query or transformation without changing how the query or transformation is phrased.
On the topic of knoweldge graphs, one keynote speaker pointed out that knowledge graphs need sets of rules to make full sense and that this hasn't been formally introduced in literature enough . An example of the rules: consider a graph which is composed of nodes who are people and edges who represent relationships ("daughter of", "sister to"). In this graph, if X is the mother of Y, it means Y cannot be mother of X. This seems simple but also a crucial part of what comprises "knowledge" for a knowledge graph.
In another keynote, Zhou Jingren from Alibaba discussed his team's research on using a "dynamic dataflow" to parallelize graph traversal queries . He mentioned how his data engineers have had to develop new strategies such as this kind of dynamic dataflow to address the limitations of traditional depth-first and width-first graph traversal methods that do not scale to the size of Alibaba's data.
Altogether, over half of the keynotes at this year's conferenced discussed large-scale graph data at some point. This is in contrast to the ratio of conference sessions specifically about graph data. Due to that, I feel it cannot be ignored as a place where progress is being made within data engineering.
Beyond the keynotes, there were several sessions each day devoted to more specific technical topics in graph processing. New techniques for parallel processing of graph data were discussed for running enumeration calculations on a single machine , handling searches on a knowledge graph , and computing general graph analytics . Aside from parallel processing, other research also discussed more efficient ways to do popular and standard graph computation such as random walks  Some researche introduced techniques for identifying specific patterns in a graph such as a motif , a triangle, or a pattern revealing fraud . While it's mostly applicable to social network data, there were also a number of papers discussing topics such as identifying influence in an emphemeral or uncertain graph   and processing graph data that is updated as a stream  .
I think overall the advances in this area are not useful to the industry unless they are released by some of the market leaders or open source community. Perhaps the best available tool for graph manipulation I learned about was TinkerPop and Gremlin, an Apache project making it easy to express graph traversal queries in a language similar to Apache Beam or even some flavor of SQL.
Graphs for deep learning
The first presentation including information about using graph data for deep learning again came from Alibaba's Zhou Jingren . Here, he introduced a concept of "unified graph embeddings"; that is, in an e-commerce context a user and a product can be represented by the same graph properties and embedded to the same lower-dimension vector space. I personally think this was the most interesting idea and takeaway from the whole conference. While most graph embeddings approaches in the literature simply embed first and second-order proximities, another paper in the conference also introduced some methods for embedding centrality concepts like PageRank and degree . Finally, there was another quite technical paper related to graph embeddings that introduced a simpler but equally effective alternative to Generative Adversarial Networks (GANs) for negative sampling graph embeddings .
Data mining, Attribute and tag extraction (with graphs)
While not completely dominated by the data structure, graphs also made a showing in data mining topics. It seems novel to me that one presentation reframed the task of uncovering tags and attributes to associate with items on an ecommerce marketplace as a "knowledge base completion" problem using the existing knowledge graph and user data such as new searches . Also related to searches and applicable to an e-commerce "knowledge base" or "knowledge graph" was a paper which introduced new algorithms for answering "why?" and "why not?" questions about the results of some sort of graph-based query . For example, in the UI of a search results page it might be useful to include some option to explain why a result appeared (why aren't there any leather jackets in the results? => the price filter in the search was set too high) or a why a certain kind of result didn't appear.
For data scientists in industry working on solutions for web services companies – which means mostly working with clickstream data – there's much to take away from the conference and from attending an academic conference like this in general:
- Of the major themes I noticed in the conference, I think the one DMM's data engineers should get on board with first is trying graph-based deep learning, which would also involve structuring some user or item data as a heterogeneous multi-attributed graph. I hope we can do one project with this by the end of the year!
- Publishing to this level of conference on the industry track seems like a reachable goal for the AI team at DMM if we can dedicate some of our time to further research and time for writing
- We should make sure to attend a few of these conference each year to stay in touch with the movement of the academic community. For example, Very Large Databases (VLDB) will be held in Tokyo in 2020. We should definitely send one or two members to attend!
-  AIR: Attentional Intention-Aware Recommender Systems
-  Interpretable Multi-Task Learning for Product Quality Prediction with Attention Mechanism
-  AppUsage2Vec: Modeling Smartphone App Usage for Prediction
-  Context-aware Co-Attention Neural Network for Service Recommendations
-  Enterprise Knowledge Graphs: Principles, Applications, and Opportunities (keynote)
-  Managing, Analyzing, and Learning Heterogeneous Graph Data: Challenges and Opportunities (keynote)
-  Efficient Parallel Subgraph Enumeration on a Single Machine
-  An Efficient Parallel Keyword Search Engine on Knowledge Graphs
-  TuFast: A Lightweight Parallelization Library for Graph Analytics
-  Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness
-  Discovering Maximal Motif Cliques in Large Heterogeneous Information Networks
-  Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks
-  Mining Periodic Cliques in Temporal Networks
-  Fast and Accurate Graph Stream Summarization
-  Online Social Media Recommendation over Streams
-  Exploiting Centrality Information with Graph Convolutions for Network Representation Learning
-  NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
-  Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms
-  Answering Why-Questions for Subgraph Queries in Multi-Attributed Graphs