Friday, April 29, 2016

Tinker, Juggler, Mathematician Guy: Claude Shannon Doodled by Google

Google's doodles, just like its profound search box, have been a constant source of rich and curious information for quite some time. And they are already celebrated much in the media.

Still, I found the most recent one, on a guy called Claude Shannon, quite interesting not only to note but to write a blog post as well. And while the doodles have of late been slipping in quality and creativity, this one seems to have put the shine back: what with a cute cartoon of Claude juggling zeroes and ones bang in the middle of the letters that make up Google, dissecting it into GOO and GLE.

I also felt embarrassed, nay ashamed, that I had to refer to the pioneer of information theory and unarguably a great mathematical genius, as "a guy called..." in the above para. Time.com headlined its piece on Shannon as The Juggling Unicyclist Who Pedaled Us Into the Digital Age.

Now, that's indeed quite a fitting and interestingly written tribute!

Let's look at how some of the other media sites and scientific portals describe him in their articles (post- as well as pre-doodle celebrating his birth centenary):

Without Shannon's information theory there would have been no internet (The Guardian)

Claude Shannon: Tinkerer, Prankster, and Father of Information Theory (IEEE Spectrum)

Claude Shannon: Reluctant Father of the Digital Age (MIT Technology Review)

Celebration time for Gaylord's Shannon, who 'changed the course of human history' (PetoskeyNews.com)

Keep doodling, Google!

Wednesday, April 27, 2016

The Big Data Tech behind Times Internet’s Native Ad Play - Colombia


Not many people may know it, but one of the largest publishers in India, The Times of India Group, is also home to one of the largest publisher-owned ad network platforms in APAC. The Group’s digital venture, Times Internet Ltd (TIL), runs one of the most complex and sophisticated ad serving operations, in addition to hosting the editorial content for multiple sites (internal as well as partnerships).

Around a year back, TIL had launched its own native ad platform, called Colombia. (Native ads are personalized ads that are shown to the web users based on their past browsing history, interests, etc., and are usually text ads as opposed to display/banner ads.)

Given the growing global trend of more and more brands putting their money on native ads, platforms such as Colombia are gaining increasing significance in the media world. The platform ensures similar user experience across mobile and web.

I recently spoke to Sumit Malhotra, Head – IT, TIL, to know more about how they developed Colombia, the technology behind the platform and challenges associated with what Sumit calls a “big data recommendation system.”

It would be pertinent to note in this context how the TIL ad network has grown in the past two years, which is nothing short of exponential—from serving around 40 million ad impressions per day in last June to a peak of 500 million impressions/day this month.

Besides serving ads from its own marquee properties, TIL has tie-ups with third party ad networks like Taboola and many others. So the ad inventory that is served through Colombia comes from a number of sources, all of which need to be integrated tightly with the TIL platform for serving to the user—who could be sitting and surfing in any part of the world. 

The key, says Malhotra, is to provide a consistent experience to the audience with as low latency as possible (often, the native ads are served in 100 to 150 milliseconds but, in any case, the threshold has to be kept below 500 ms).

“Otherwise, the user will have either scrolled down the page or moved on to another site (without seeing or clicking on the ad),” he says.

The biggest challenge for any ad network today is to deliver an ad at a low latency. To do so, it is important that calculations and permutations related to which ad is to be served to which user profile are based on the recommendation of the big data system—Colombia in TIL’s case—which is run in-memory rather than on disk.

Talking about the challenges, Malhotra says, “Another challenge is that suppose a user is coming from the US and hitting our servers in India, so the travel time for a data packet is quite high; we need to take it closer to the audience. And since every ad is personalized, that is a big challenge.”

Part of the low-latency challenge is solved by having multiple ad server clusters in different geographic regions of the world. TIL has its servers hosted in different geographies and uses a mix of public cloud options and its own data centers. “This helps us serve ad requests within specific regions,” he says.

Another very notable and significant thing is that TIL has custom-built its own big data engine using open source tools and technologies—all done in-house by its 100-plus technology development team.
It took slightly over a year to build Colombia and then another 9 to 12 months to roll it out fully across the board for all online properties.

“Colombia was launched about eight months to a year back. And given that the infrastructure is huge, we had to roll it out to all the properties, 40 different brands across Times Internet. So the deployment also took time, as the technology needed to be integrated with the publishers as well. And after rolling out, you understand the issues and then scale it up slowly,” says Malhotra. Now it is fully deployed not only across all TIL properties but with the third parties as well.

The key benefit is derived from “the complex algorithms that help us run highly targeted campaigns,” he says.

Native ad platforms are one of the primary reasons why today’s users get the sense that if they go to particular kinds of sites (sport, technology, housing, etc.), the ads that accompany on the sites that they visit next are mostly related to the content they had just viewed.

“Given that it’s a personalized ad network, we need to do in-memory data recommendations. For maintaining low latency, we cannot afford to do any calculations on data that goes to the disk, all calculations have to be done in-memory,” says Malhotra.

Malhotra says that most components of big data analytics systems today run on bare metal servers rather than virtualized ones, as the virtualization adds another layer to the process and increases the latency. With the scale and complexity that TIL operates in, he says, “we cannot survive any virtualization overhead.”

On being asked why TIL didn’t go for a branded big data analytics platform (from known large vendors), he had this to say: “Because those vendors cannot give you that kind of personalization in-memory at this rate and at this kind of a size/scale. So for example, if we are using, say, 300 servers for our open source solution, it would require 600 or more servers to do the same things on vendor products.”

Also, Malhotra is of the opinion that if one uses a vendor product, one gets locked on to it and it also takes away the flexibility.

So it looks like TIL is not going to take the outsourcing route in the near future.

To keep its ad network in good health, TIL also has to do a lot of monitoring of how the ads are being served across different geographies. One, to check if a consistent experience is being delivered to the user; and two, whether a campaign is not surreptitiously trying to deliver some malware to the end user’s device (in which case the campaign is stopped and/or blocked.)

For this, TIL uses a mix of in-house resources as well as tools from third-party providers.

The challenge now is that, sometimes, the traffic at multiple sites can peak at the same time, which requires a different kind of scalability.


(Note: This blog post first appeared on dynamicCIO.com)