Predictive Analytics World London
etc.venues, 200 Aldersgate, 11-12 October, 2017
Predictive Analytics World for Business - London - Day 1 - Wednesday, October 11th, 2017
- The core reason for data visualization
- Learning the patterns corresponding to various multivariate relations
- A topology of proximity opening the way for visualization in Big Data
A dataset with M items has 2M subsets anyone of which may be the one satisfying our objective. With a good data display and interactivity our fantastic pattern-recognition defeats this combinatorial explosion by extracting insights from the visual patterns. This is the core reason for data visualization. With parallel coordinates the search for relations in multivariate data is transformed into a 2-D pattern recognition problem. Together with criteria for good query design, we illustrate this on several real datasets (financial, process control, credit-score, one with hundreds of variables) with stunning results. A geometric classification algorithm yields the classification rule explicitly and visually. The minimal set of variables, features, are found and ordered by their predictive value. A model of a country’s economy reveals sensitivities, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. A topology of proximity emerges opening the way for visualization in Big Data. Learn how to answer questions you did not know ... how to ask
- How operational research techniques can enhance predictive analytics
- Optimisation methods to find the profit optimal footprint for Retailers
- Simulation techniques to model the flow of patients in Emergency departments
Powerful, user-friendly tools are critical for successful implementation and to empower decision makers for their ongoing and more robust decision making. Two tools are presented to demonstrate how operational research techniques can enhance predictive analytics by exploring alternative scenarios under future uncertainty and change. The first tool employs optimisation methods to find the profit optimal footprint for Retailers, taking into account how business flows around the network as it changes. The second tool uses simulation techniques to model the flow of patients in Emergency departments. It has been used at two Emergency departments to help understand the impact of reconfiguring the departments, changes to workforce rotas and changes to capacity levels.
- Every analytics challenge reduces, at its technical core, to optimizing a metric
- As algorithms improve over time, one could imagine obtaining a solution merely by defining the guiding metric
- But are our tools that good? Are we aiming them in the right direction
Every analytics challenge reduces, at its technical core, to optimizing a metric. Product recommendation engines push items to maximize a customer's purchases; fraud detection algorithms flag transactions to minimize losses; and so forth. As modeling and classification (optimization) algorithms improve over time, one could imagine obtaining a solution merely by defining the guiding metric. But are our tools that good? More importantly, are we aiming them in the right direction? I think, too often, the answer is no. I'll argue for clear thinking about what exactly it is we ask our computer assistant to do for us, and recount some illustrative war stories. (Analytic heresy guaranteed.)
- Anti-ML efforts are a large part of financial crime compliance
- Successful ML detection is inherently different from classical anomaly detection in time-series data
- Risk-based approach to anti-ML and what this means for ML detection techniques
A large part of financial crime compliance for financial institutions is in the anti-ML efforts, where banks are required to conduct ongoing clients due diligence and transaction monitoring. Predictive analytics are being more relied upon for the latter. However, successful ML detection is inherently different from classical anomaly detection in time-series data. Global financial action task forces advocate for a risk-based approach to anti-ML, and this presentation will cover what this means for ML detection techniques.
- Algorithm for re-distributing marketing budget according to the input of each channel
- How machine learning can increase the overall e-commerce spendings and the revenue per customer
- Open source tools and query samples in a step-by-step guide to implement your own predictive customer journey
Together with our client 220-volt.ru, one of the largest Russian DIY online-stores, we tried to understand customer behaviour and to find an algorithm to re-distribute marketing budget according to the input of each channel. To do this, we took historical user data and calculated the probability of purchase. In this deep dive / case-study we will show how we used machine learning to increase the overall e-commerce spendings up to 20%. The revenue per customer also increased significantly: twice in comparison to control group. At the beginning, we have tested different machine learning models: logistic regression, random forest, XGBoost. We checked it out then with AUC ROC metric and used holdout and cross-validation. To make sure our predictions are working we applied finally an A/B test. In our workshop-like session we will provide all tools and query samples in a step-by-step guide, so that everyone can try to play with their own raw data using free and open source tools: Yandex.Metrica Logs API, ClickHouse open-source DBMS, Python, Pandas, XGBoost. This session enables you to implement your own predictive customer journey.
- True innovation starts with asking Big Questions
- New perspectives on the Big Data and Data Science challenges we face today
- How learning from the past can help you solve the problems of the future
The “Big Data” and “Data Science” rhetoric of recent years seems to focus mostly on collecting, storing and analysing existing data. Data which many seem to think they have “too much of” already. However, the greatest discoveries in both science and business rarely come from analysing things that are already there. True innovation starts with asking Big Questions. Only then does it become apparent which data is needed to find the answers we seek. In this session, we relive the true story of an epic voyage in search of data. A quest for knowledge that will take us around the globe and into the solar system. Along the way, we attempt to transmute lead into gold, use machine learning to optimise email marketing campaigns, experiment with sauerkraut, investigate a novel “Data Scientific” method for sentiment analysis, and discover a new continent. This ancient adventure brings new perspectives on the Big Data and Data Science challenges we face today. Come and see how learning from the past can help you solve the problems of the future.
- Public construction contracts are awarded through the competitive bidding process
- Presentation of a model that accurately predicts competitor’s bids
- Case study of client who achieved a CAGR of 46%
Each year, a significant number of public construction contracts are awarded through the competitive bidding process. For most contractors, this represents the proverbial "Catch 22." Bid too high and you lose, bid too low and you jeopardize profit. What if you knew the other bids? Paul will present a model that accurately predicts competitor’s bids. The results are astonishing. One Contractor increased the number of winning bids to 80% from 58% but more importantly maximized their profit potential for each award. With the help of “Confucius,” this client achieved a CAGR of 46%.
- GDPR regulations will bring some unsettling new requirements for data scientists
- Five topics of interest
- Illustrated with concrete examples built using open source software
The wide-ranging GDPR regulations to be implemented by all EU countries on 25 May 2018 will bring some unsettling new requirements for data scientists that use data considered personal in the EU – not just for “consumer” data but also for business-to-business data. This presentation focuses on five specific topics of interest: notification, permission for use, the right to be forgotten, discrimination and “pseudo-discrimination”, as well as anonymization. It will be illustrated with concrete examples built using open source software, so you can try a few of the ideas yourself.
Dinner with strangers:
meet your fellow attendees.
See the registration desk for more information
Predictive Analytics World for Business - London - Day 2 - Thursday, October 12th, 2017
- Time-sensitive machine learning model to mine customers’ demographic and behavior features and predict time of likely conversion
- Deep learning and survival analysis for extending traditional linear survival analysis to non-linear data
- Applying DeepSurvival allows to outperform any of the traditional targeted advertising algorithms
Targeted advertising is a form of advertising that focuses on certain attributes of the customers. The advertisement should influence the best consumer for their company’s product at the right time. In this talk, we demonstrate a time-sensitive machine learning model to mine customers’ demographic and behavior features and predict when the customer is likely to convert. We adopted deep learning and survival analysis (a Deep Cox Proportional Hazards Network) for extending traditional linear survival analysis to non-linear data. Our results demonstrated significant effectives on time-sensitive marketing campaigns. Applying DeepSurvival allows us to outperform any of the traditional targeted advertising algorithms.
- How one of the largest logistics company in the Middle East automated the delivery of shipments using machine learning
- No postcodes in the region, instead customers need to give a description of the address
- How GPS was used to predict the delivery location using clustering and classification algorithms
This presentation will give an insight into how we helped one of the largest logistics company in the Middle East to automate the delivery of their shipments using machine learning. The problem was particularly interesting because there are no postcodes in the region. Instead our client would have to make a phone call to each customer and get a description of the address instead, e.g. house with brown door around XYZ roundabout. This process became more difficult because of the cultural dynamics existing amongst the demographics.This presentation will describe how we leveraged the GPS data the client had to predict the delivery location using clustering and classification algorithms and how we operationalised the model in order to drive actions in real time.
- Data arrives in a continuous fashion and the data stream continuously refreshes and changes
- Fit regression models online, updating the parameters of the model as new data arrives, so that the model updates to reflect the data from the stream
- Spark Streaming Regression Model to illustrate how to predict a better forecast than traditional batch prediction method
In many applications, data arrive in a continuous fashion, in data streams of sensor data, transaction data, communication data, etc. The data stream continuously refreshes and changes. If time to insight is crucial, it is useful to fit regression models online, updating the parameters of the model as new data arrives, so that the model continually updates to reflect the data from the stream. In this session, I will walk through Spark Streaming Regression Model to illustrate how to predict a much better forecast than traditional batch prediction method.
- How to design a predictive model to carry out a segmentation process with the massive data coming from Twitter
- Behavior, cluster analysis, feature engineering and ensemble methods
- Python a R languages to build the model
Customer segmentation is one of the most important aspects on marketing field. It is a factor when launching any kind of campaign. Either for commercial purposes as well as political targeting. In this session, Miguel Barros will explain how to design a predictive model to carry out a segmentation process with the massive data coming from Twitter. Topics that are going to be considered will be user behavior, cluster analysis, feature engineering and ensemble methods as well. Python a R languages were used to build our models, making use of the advantages of each one.
- Building a price prediction model for AutoScout24, making this model available and learning from user feedback
- Gaining transparency in current and future market values
- Helping each seller to find his individually optimized selling pric
Scout24 is a leading operator of digital marketplaces specializing in the real estate and automotive sectors in Germany and other selected European countries. Scout24 aims to transform data into Market Insights to empower their users making informed decisions. This session is a journey about building a price prediction model for AutoScout24, making this model available and learning from user feedback. After gaining transparency in current and future market values, the next steps are about helping each seller to find his individually optimized selling price dependent on the sellers’ personal preferences regarding speed of sale and revenue.