Massachusetts Institute of Technology researchers have developed a method to aggregate and organize fragmented information from across the Web, something Google has been working on for years through a project called Knowledge Graph.
MIT Sloan Professor Cynthia Rudin and colleagues Benjamin Letham and Katherine Heller developed the crowdsourcing algorithm to aggregate and process big data collected from across Web sites, blogs and social sites, as well as automate the continuous cycle of entering a keyword to query information.
Rudin said the algorithm competes directly with a Google product called Google Sets.
MIT's algorithm makes sense of posts from a variety of content across the Internet, based on the author's expertise and Web site's authority. It builds and reads many Web sites simultaneously, while pulling in information directly and indirectly related to the initial keyword. For example, if the searcher used the keyword "Barack Obama" and "Scott Brown," the algorithm also constructs a search for "Nancy Pelosi" and "John McCain," while constructing its list of politicians.
The algorithm designs return queries instantly, with unrestricted access to a search engine, according to Rudin. "At the moment, Google prevents anyone from doing more than a few search queries a minute; but it is an artificial restriction that they can remove," she said.
How much information does the Internet hold?
The global Internet population represents more than 2.1 billion people. With each piece of data shared, it leaves a digital footprint. An infographic from Josh James, DOMO founder, breaks down the amount of data generated per minute. As of June 2012, the mobile Web received 217 new users per minute, YouTube uploads 48 hours of video, Facebook users share 684,478 pieces of content, Instagram users share 3,600 photos, and Tumblr sees 27,778 posts published.
Rudin said search relies on explicit social signals, not implicit ones. "The explicit social signal we use: If someone lists an item on the Webpage in a similar way to the seed items, they are telling us that it is a potentially useful piece of information that might go on the list we construct," Rudin explains. "There's definitely a possibility to put the implicit social signals directly into the algorithm."
There are challenges. While this is Rudin's first iteration, Google has been working on search for years and has access to more implicit information than MIT researchers. For example, the algorithm cannot guess age and gender of the searcher.
Very interesting article thanks. Could you please explain further what you mean by implicit social signals?
Many thanks
Cynthia wrote: (re: implicit) We aren't trying to calculate things about the people who post lists on their webpages. For instance, we don't try to guess their age, gender, or political biases. We just use the information on the page we think they want other people to use. We could try to, for instance, estimate their audience or include other similar things, but we don't do that today.