Thinkleton: Big Data

Showing posts with label Big Data. Show all posts

Monday, July 16, 2018

How Paytm Uses Tech to Manage 200 Million Users

Key points:

- Paytm processed 1 billion transactions in the quarter ended March 2018
- The firm employs 200 product managers and over 700 engineers
- Its data science lab in Toronto, Canada, develops key tech tools
- App analytics and machine learning are used to retain users and for up-selling

Mobile wallets--mobile apps used to pay for recharges, groceries and other daily items--may have come of age in an increasingly digital India, but much goes behind-the-scenes to keep them working well and users hooked.

Paytm, which has 200 million monthly active users and processed close to 1 billion transactions in the quarter ended March 2018, is a case in point. It competes with MobiKwik, FreeCharge, PhonePe and several others in this space.

Discussing the tech strategy at the company in a recent interview, Deepak Abbot (pictured), senior vice president of One97 Communications Ltd, which owns and operates Paytm, said, “Though a payments firm, we are a technology company at the core and everyone here, including Vijay, is a hardcore techie--he even calls mid-level engineers sometimes to discuss architecture design.” (Vijay Shekhar Sharma is the chief executive of Paytm.)

Abbot said that most in top positions at the company either have technology background or are “quite comfortable” with tech. “Culturally, we have a tech mindset. That is another reason we have been able to build a very complex product in a flexible way.”

Sharing insights into what goes on ‘under the hood’ as they say in tech, Abbot said that quick decision-making and a product-centric approach drive software development. “In our meetings, once an idea is crystallized, Vijay is very clear about what product to build. As a result, the product managers are also clear how to get it done. And when the engineers are given very specific details, they are able to quickly build it,” he revealed.

The simplicity of the Paytm app belies its complex architecture and the number of people that work on it. For instance, Abbot said that there are as many as 200 product managers and 700-800 engineers working on different aspects of the app.

But how does Paytm define a product? “At Paytm, a product is defined as anything a consumer—be it is an end consumer, a merchant or a marketplace seller--interacts with,” said Abbot. For example, recharge is a product in itself. Paytm’s implementation of Unified Payments Interface (UPI), again, is a product (UPI is an easy, instantaneous payment system built by the National Payments Corporation of India or the NPCI). “And then you build use-cases on top of UPI such as P2P, P2M and B2B payments. Wallet--the most used product of Paytm--is another,” said Abbot. (P2P, P2M and B2B stand for person-to-person, person-to-merchant and business-to-business respectively.)

The idea of keeping all these products within the same Paytm app, according to him, is that users should move from one product to another seamlessly—something that requires “a highly scalable product” to be built.

Integration of multiple products within the same app also helps Paytm cross-sell more easily to customers, who may first use one product before being “nudged towards” others, said Abbot.

Talking about stickiness of the app and up-selling to users, he said, “We have observed that if a customer has only used Paytm for recharge, then the retention rate for such a user is 40% after three months. But if we can upgrade him to send money to others, they become power users of Paytm and the retention improves dramatically to 70%.”

Industry experts forecast bright days ahead for mobile wallets. The number of mobile wallet users is expected to grow from the current 200-250 million to around 500 million in the next couple of years, according to Probir Roy, co-founder of Paymate and an independent director at Nazara Technologies. While he believes that “the next big thing” will be “interoperability” among different wallets, he noted that it is a tough space to operate in and some consolidation is “bound to happen” in the coming years. “My guess is that the top two or three companies will have 80% of the market,” he concluded.

------Paytm Labs: Managing customer lifecycles-----

To make the most of app analytics that capture user behaviour, Paytm’s data science lab, Paytm Labs, in Toronto, Canada, works on developing multiple software tools. One such key tool is CLM or Customer Lifecycle Management.

According to Deepak Abbot, senior vice president of the company who is based at Paytm’s headquarters at Noida near Delhi, what CLM does is “catch every ‘signal’ from the app”. Explaining how it works, he said, “If you use the app for UPI, it segments you as a UPI user; if you do a recharge, it marks you as a recharge user. It also upgrades you automatically based on your behaviour or purchase history. So, for instance, if you make an electricity bill payment or a post-paid bill payment, it upgrades you to a post-paid user.” There is a lot of granularity built into the CLM tool to classify and reward different levels of users at different times.

The tool puts users in different segments and generates actionable triggers accordingly. “For example, if a premium user who earlier made a money transfer of Rs 5,000 has not used the app for a month, he will be shown a cashback offer or an ad on Facebook,” said Abbot. Similarly, alerts are shown for soon-to-expire mobile recharges and other bills. “The CLM tool uses such alerts and offers to get those customers back into the app. And if they are already in the app, it will customise the view for them by showing up frequently used icons upfront and hiding others,” he explained.

The entire user data in the Paytm app goes into a “data lake”, and the team in Canada uses it to formulate the rules of the risk engine and other software. The data lake, Abbot explained further, is a repository of multiple sources of data, including phone usage data, hardware data and address book; then there is transactional data plus the behavioural data (where the users navigate inside the app, how much time they spend shopping, etc). All this data is used through machine learning (ML) algorithms so that the alerts and promotions can be automated and personalized.

The Toronto team comprises 70 data scientists and engineers and, besides the CLM tool, has developed the company’s risk and customer score engines. “We just plug those products here (in India) and start using them,” said Abbot.

---##----

(Note: An edited version of the above post first appeared on www.livemint.com - where I used to work until recently. The interaction with Deepak Abbot took place during my Mint tenure.)

Sunday, September 11, 2016

Big Data Analytics and the Global Hunger Challenge

(Image credit: Pixabay.com)

In a world where as many as one-ninth (around 800 million) of the global population of over 7 billion go hungry each day, 33% of the food produced for human consumption is wasted every year.

As regards India, it is home to the largest undernourished and hungry population in the world: 15.2% of India’s population is undernourished and 194.6 million people go hungry every day, according to India FoodBanking Network.

Certainly not a healthy picture—but possibly not one that technology cannot help redress.

According to a new report on McKinsey.com, global food waste and loss cost a staggering $940 billion A YEAR, with a carbon footprint of more than 8% of global greenhouse-gas emissions and a blue-water footprint that is 3.6 times the annual consumption of the US.

Such a sorry state of global food chain can be set right with appropriate use of digital innovation, including big data analytics, among others.

In my view, there is opportunity not just for governments but also for large businesses that plug into the huge global food supply chain in one way or another: the opportunity to apply creative thinking led by digital tools to bring down wastage, optimize costs and put more food on the table of poor people.

The McKinsey report suggests that cutting postharvest losses in half would produce enough food to feed a billion more people.

This and other social and economic benefits can be achieved by using technology to improve areas such as climate forecasting, demand planning, and the management of end-of-life products, argues McKinsey. The report quotes examples of work being done by startups and others in this area. For instance, a French startup, Phenix, runs a web-based marketplace to connect supermarkets with end-of-life food stocks to NGOs and consumers who could use them. “The platform enables the supermarkets to save the costs of disposal, gives consumable products a second life, and alleviates some of the social and environmental burden of waste,” it says.

For emerging economies such as India, the report suggests that innovations like precision agriculture, supply-chain efficiencies and agriculture-focused payment systems can make a huge difference.

For one, precision agriculture—which uses big data analytics, aerial imagery, sensors, etc.—is used to observe, measure and analyze the needs of individual fields and crops rather than take a one-size-fits-all approach to farming in a region or cluster of fields.

Startups as well as big behemoths are participating in this huge opportunity (the market for agricultural robotics alone is forecast to rise from $1 billion in 2014 to up to $18 billion by 2020).

So, while the startup Blue River uses computer vision and robotics to determine the needs of individual plants, Big Blue (also known as IBM) has developed a highly precise weather-forecast technology, Deep Thunder, and an agriculture-specific cloud technology.

Needless to say, we will need a basket of technologies from multiple vendors to keep large amounts of food from being thrown away or going waste, to optimize the yield from agriculture, to eliminate or reduce transportation inefficiencies—and do anything and everything to bring down the number of the daily hungry.

Wednesday, April 27, 2016

The Big Data Tech behind Times Internet’s Native Ad Play - Colombia

Not many people may know it, but one of the largest publishers in India, The Times of India Group, is also home to one of the largest publisher-owned ad network platforms in APAC. The Group’s digital venture, Times Internet Ltd (TIL), runs one of the most complex and sophisticated ad serving operations, in addition to hosting the editorial content for multiple sites (internal as well as partnerships).

Around a year back, TIL had launched its own native ad platform, called Colombia. (Native ads are personalized ads that are shown to the web users based on their past browsing history, interests, etc., and are usually text ads as opposed to display/banner ads.)

Given the growing global trend of more and more brands putting their money on native ads, platforms such as Colombia are gaining increasing significance in the media world. The platform ensures similar user experience across mobile and web.

I recently spoke to Sumit Malhotra, Head – IT, TIL, to know more about how they developed Colombia, the technology behind the platform and challenges associated with what Sumit calls a “big data recommendation system.”

It would be pertinent to note in this context how the TIL ad network has grown in the past two years, which is nothing short of exponential—from serving around 40 million ad impressions per day in last June to a peak of 500 million impressions/day this month.

Besides serving ads from its own marquee properties, TIL has tie-ups with third party ad networks like Taboola and many others. So the ad inventory that is served through Colombia comes from a number of sources, all of which need to be integrated tightly with the TIL platform for serving to the user—who could be sitting and surfing in any part of the world.

The key, says Malhotra, is to provide a consistent experience to the audience with as low latency as possible (often, the native ads are served in 100 to 150 milliseconds but, in any case, the threshold has to be kept below 500 ms).

“Otherwise, the user will have either scrolled down the page or moved on to another site (without seeing or clicking on the ad),” he says.

The biggest challenge for any ad network today is to deliver an ad at a low latency. To do so, it is important that calculations and permutations related to which ad is to be served to which user profile are based on the recommendation of the big data system—Colombia in TIL’s case—which is run in-memory rather than on disk.

Talking about the challenges, Malhotra says, “Another challenge is that suppose a user is coming from the US and hitting our servers in India, so the travel time for a data packet is quite high; we need to take it closer to the audience. And since every ad is personalized, that is a big challenge.”

Part of the low-latency challenge is solved by having multiple ad server clusters in different geographic regions of the world. TIL has its servers hosted in different geographies and uses a mix of public cloud options and its own data centers. “This helps us serve ad requests within specific regions,” he says.

Another very notable and significant thing is that TIL has custom-built its own big data engine using open source tools and technologies—all done in-house by its 100-plus technology development team.

It took slightly over a year to build Colombia and then another 9 to 12 months to roll it out fully across the board for all online properties.

“Colombia was launched about eight months to a year back. And given that the infrastructure is huge, we had to roll it out to all the properties, 40 different brands across Times Internet. So the deployment also took time, as the technology needed to be integrated with the publishers as well. And after rolling out, you understand the issues and then scale it up slowly,” says Malhotra. Now it is fully deployed not only across all TIL properties but with the third parties as well.

The key benefit is derived from “the complex algorithms that help us run highly targeted campaigns,” he says.

Native ad platforms are one of the primary reasons why today’s users get the sense that if they go to particular kinds of sites (sport, technology, housing, etc.), the ads that accompany on the sites that they visit next are mostly related to the content they had just viewed.

“Given that it’s a personalized ad network, we need to do in-memory data recommendations. For maintaining low latency, we cannot afford to do any calculations on data that goes to the disk, all calculations have to be done in-memory,” says Malhotra.

Malhotra says that most components of big data analytics systems today run on bare metal servers rather than virtualized ones, as the virtualization adds another layer to the process and increases the latency. With the scale and complexity that TIL operates in, he says, “we cannot survive any virtualization overhead.”

On being asked why TIL didn’t go for a branded big data analytics platform (from known large vendors), he had this to say: “Because those vendors cannot give you that kind of personalization in-memory at this rate and at this kind of a size/scale. So for example, if we are using, say, 300 servers for our open source solution, it would require 600 or more servers to do the same things on vendor products.”

Also, Malhotra is of the opinion that if one uses a vendor product, one gets locked on to it and it also takes away the flexibility.

So it looks like TIL is not going to take the outsourcing route in the near future.

To keep its ad network in good health, TIL also has to do a lot of monitoring of how the ads are being served across different geographies. One, to check if a consistent experience is being delivered to the user; and two, whether a campaign is not surreptitiously trying to deliver some malware to the end user’s device (in which case the campaign is stopped and/or blocked.)

For this, TIL uses a mix of in-house resources as well as tools from third-party providers.

The challenge now is that, sometimes, the traffic at multiple sites can peak at the same time, which requires a different kind of scalability.

(Note: This blog post first appeared on dynamicCIO.com)

Tuesday, January 26, 2016

Leading Data Scientist Talks about, Well, What Data Scientists Do!

Just as data keeps proliferating all around us, there is a great hue and cry about what to do with all those terabytes, petabytes, exabytes…whatever bytes you! Sure, there are ever powerful number-crunching machines and more capable software, but at the end of the day, you are going to need professionals especially skilled in the science of data analysis, management and insights.

That will be the Data Scientist, a role dubbed by some as the sexiest job of this century. Sexy not necessarily in terms of what all it involves but certainly in the high demand and even higher pay packets.

But what exactly would these data scientists do?

An illuminating blog entry on this very interesting and still intriguing question was posted recently by Bernard Marr, an analytics expert and founder of Advanced Performance Institute. To demystify what the work of a data scientist actually involves, and what sort of person is likely to be successful in the field, Marr spoke to one of the world’s leading data scientists, Dr. Steve Hanks—a doctorate from Yale who has worked with companies like Amazon and Microsoft.

Currently the Chief Data Scientist at Whitepages.com (whose Contact Graph database contains information for over 200 million people and which is searched 2 billion times a month), Dr. Hanks talks about some key attributes of a data scientist: One, they have to understand that data has meaning; Two, they have to understand the problem that they need to solve, and how the data relates to that; and Three, they have to understand the engineering (behind delivering a solution).

While all three of these capabilities are important, writes Marr, it doesn’t mean there’s no room for specialization. He quotes Hanks as saying that it is “virtually impossible to be an expert in all three of those areas, not to mention all the sub-divisions of each of them.” The important thing here is that even if one specializes in one of these areas, one at least has good appreciation of all of them. Further, in Hanks’ words: “Even if you’re primarily an algorithm person or primarily an engineer—if you don’t understand the problem you’re solving and what your data is, you’re going to make bad decisions.”

I can especially identify with the “holistic appreciation” quality of data scientists, as many CIOs and development project heads have often shared similar sentiments about most code writers: they are too narrowly focused on the “problem” at hand and usually miss the big picture about the whole project.

Fortunately, unlike the job of a programmer, the field of data science is attracting or likely to attract people “of different personality types and mindsets.”

Having said that, the main challenge for data scientists is not in specializing in a particular machine learning algorithm or a particular sub-field or tool, but in keeping up with the general speed of development in data science, the blog notes.

For more interesting details and insights, I would urge you to read the full blog post.

Do let me know what you think of the fast-emerging field of Data Science.

(Note: This blog post first appeared on dynamicCIO.com. Image courtesy: Americanis.net)