algorithm – The Extensible Librarian

Do You Know Yewno, and If Yewno, Exactly What Do You Know?

if you know Yewno, and if Yewno, exactly what do you know? --that "exactly what" will likely contain machine-generated replications of problematic human biases.

Gavin FerribyJune 19, 2017

This is the third of "undiscovered summer reading" posts, see also the first and second.

At the recent Association of College and Research Libraries conference Baltimore I came across Yewno, a search-engine-like discovery or exploration layer that I had heard about. I suspect that Yewno or something like it could be the "next big thing" in library and research services. I have served as a librarian long enough both to be very interest, and to be wary at the same time --so many promises have been made by the information technology commercial sector and the reality fallen far short --remember the hype about discovery services?

Yewno is a so-called search app; it "resembles as search engines --you use it to search for information, after all--but its structure is network-like rather than list-based, the way Google's is. The idea is to return search results that illustrate relationships between relevant sources" --mapping them out graphically (like a mind map). Those words are quoted from Adrienne LaFrance's Atlantic article on growing understanding of the Antikythera mechanism as an example of computer-assisted associative thinking (see, all these readings really come together). LaFrance traces the historical connections between "undiscovered public knowledge," Vannevar Bush's Memex (machine) in the epochal As We May Think, and Yewno. The hope is that through use of an application such as Yewno, associations could be traced between ancient time-keeping, Babylonian and Arabic mathematics, medieval calendars, astronomy, astrological studies, ancient languages, and other realms of knowledge. At any rate, that's the big idea, and it's a good one.

So who is Yewno meant for, a what's it based on?

Lafrance notes that Yewno "was built primarily for academic researchers," but I'm not sure that's true, strictly. When I visited the Yewno booth at ACRL, I thought several things at once: 1) this could be very cool; 2) this could actually be useful; 3) this is going to be expensive (though I have neither requested nor received a quote); and 4) someone will buy them, probably Google or another technology octopus. (Subsequent thought: where's Google's version of this?) I also thought that intelligence services and corporate intelligence advisory firms would be very, very interested --and indeed they are. Several weeks later I read Alice Meadows' post, "Do You Know About Yewno?" on the Scholarly Kitchen blog, and her comments put Yewno in clearer context. (Had I access to Yewno, I would have searched, "yewno.")

Yewno is a start-up venture by Ruggero Gramatica (if you're unclear, that's a person), a research strategist with a background in applied mathematics (Ph.D. King's College, London) and M.B.A. (University of Chicago). He is first-named author of "Graph Theory Enables Drug Repurposing," a paper (DOI) on PLOS One that introduces:

a methodology to efficiently exploit natural-language expressed biomedical knowledge for repurposing existing drugs towards diseases for which they were not initially intended. Leveraging on developments in Computational Linguistics and Graph Theory, a methodology is defined to build a graph representation of knowledge, which is automatically analysed to discover hidden relations between any drug and any disease: these relations are specific paths among the biomedical entities of the graph, representing possible Modes of Action for any given pharmacological compound. We propose a measure for the likeliness of these paths based on a stochastic process on the graph.

Yewno does the same thing in other contexts:

an inference and discovery engine that has applications in a variety of fields such as financial, economics, biotech, legal, education and general knowledge search. Yewno offers an analytics capability that delivers better information and faster by ingesting a broad set of public and private data sources and, using its unique framework, finds inferences and connections. Yewno leverages on leading edge computational semantics, graph theoretical models as well as quantitative analytics to hunt for emerging signals across domains of unstructured data sources. (source: Ruggero Gramatica's LinkedIn profile)

This leads to several versions of Yewno: Yewno Discover, Yewno Finance, Yewno Life Sciences, and Yewno Unearth. Ruth Pickering, the companies co-founder and CEO of Business Development & Strategy Officer, comments, "each vertical uses a specific set of ad-hoc machine learning based algorithms and content. The Yewno Unearth product sits across all verticals and can be applied to any content set in any domain of information." Don't bother calling the NSA --they already know all about it (and probably use it, as well).

Yewno Unearth is relevant to multiple functions of publishing: portfolio categorization, the ability to spot gaps in content, audience selection, editorial oversight and description, and other purposes for improving a publisher's position, both intellectually and in the information marketplace. So Yewno Discovery is helpful for academics and researchers, but the whole of Yewno is also designed to relay more information about them to their editors, publishers, funders, and those who will in turn market publications to their libraries. Elsevier, Ebsco, and ProQuest will undoubtedly appear soon in librarians' offices with Yewno-derived information, and that encounter likely could prove to be truly intimidating. So Yewno might be a very good thing for a library, but not simply an unalloyed very good thing.

So what is Yewno really based on? The going gets more interesting.

Meadows notes that Yewno's underlying theory emerged from the field of complex systems at the foundational level of econophysics, an inquiry "aimed at describing economic and financial cycles utilized mathematical structures derived from physics." The mathematical framework, involving uncertainty, stochastic (random probability distribution) processes and nonlinear dynamics, came to be applied to biology and drug discovery (hello, Big Pharma). This kind of information processing is described in detail in a review article, Deep Learning in Nature (Vol. 521, 28 May 2015, doi10.1038/nature14539). Developing machine learning, deep learning "allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction." Such deep learning "discovers intricate structure in are data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer." Such "deep convolutional nets" have brought about significant break-throughs when processing images, video, speech, and "recurrent nets" have brought new learning powers to "sequential data such as text and speech."

The article goes on in great detail, and I do not pretend I understand very much of it. Its discussion of recurrent neural networks (RNNs), however, is highly pertinent to libraries and discovery. The backpropagational algorithm is basically a process that adjusts the weights used in machine analysis while that analysis is taking place. For example, RNNs "have been found to be very good at predicting the next character in the text, or next word in a sequence," and by such backpropagational adjustments, machine language translations have achieved greater levels of accuracy. (But why not complete accuracy? --read on.) The process "is more compatible with the view that everyday reasoning involves many simultaneous analogies that each contribute plausibility to a conclusion." In their review's conclusion, the authors expect "systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time."

After all this, what do you know? Yewno presents the results of deep learning through recurrent neural networks that identify nonlinear concepts in a text, a kind of "knowledge." Hence Ruth Pickering can plausibly state:

Yewno's mission is "Knowledge Singularity" and by that we mean the day when knowledge, not information, is at everyone's fingertips. In the search and discovery space the problems that people face today are the overwhelming volume of information and the fact that sources are fragmented and dispersed. There' a great T.S. Eliot quote, "Where's the knowledge we lost in information" and that sums up the problem perfectly. (source: Meadows' post)

Ms. Pickering perhaps revealed more than she intended. Her quotation from T.S. Eliot is found in a much larger and quite different context:

Endless invention, endless experiment,
Brings knowledge of motion, but not of stillness;
Knowledge of speech, but not of silence;
Knowledge of words, and ignorance of the Word.
All our knowledge brings us nearer to our ignorance,
All our ignorance brings us nearer to death,
But nearness to death no nearer to GOD.
Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
The cycles of Heaven in twenty centuries
Bring us farther from GOD and nearer to the Dust. (Choruses from The Rock)

Eliot's interest is in the Life we have lost in living, and his religious and literary use of the word "knowledge" signals the puzzle at the very base of econophysics, machine learning, deep learning, and backpropagational algorithms. Deep learning performed by machines mimics what humans do, their forms of life. Pickering's "Knowledge Singularity" alludes to the semi-theological vision of the Ray Kurzweil's millennialist "Singularity;" a machine intelligence infinitely more powerful than all human intelligence combined. In other words, where Eliot is ultimately concerned with Wisdom, the Knowledge Singularity is ultimately concerned with Power. Power in the end means power over other people: otherwise it has no social meaning apart from simply more computing. Wisdom interrogates power, and questions its ideological supremacy.

For example, three researchers at the Center for Information Technology Policy at Princeton University have shown that "applying machine learning to ordinary human language results in human-like semantic biases." ("Semantics derived automatically from language corpora contain human-like biases," Science 14 April 2017, Vol. 356, issue 6334: 183-186, doi 10.1126/science.aal4230). The results of their replication of a spectrum of know biases (measured by the Implicit Association Test) "indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as race and gender, for even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Their approach holds "promise for identifying and addressing sources of bias in culture, including technology." The authors laconically conclude, "caution must be used in incorporating modules constructed via unsupervised machine learning into decision-making systems." Power resides in decisions such decisions about other people, resources, and time.

Arvind Narayanan, who published the paper with Aylin Caliskan and Joanna J. Bryson, noted that "we have a situation where these artificial-intelligence systems may be perpetuating historical patterns of bias that we might find socially unacceptable and which we might be trying to move away from." Princeton researchers developed an experiment with a program called GloVe that replicated the Implicit Association test in machine-learning representation of co-occurent words and phrases. Researchers at Stanford turn this loose on roughtly 840 billion words from the Web, and looked for co-occurences and associations of words such as "man, male" or "woman, female" with "programmer engineer scientist, nurse teacher, librarian." They showed familiar biases in distributions of associations, biases that can "end up having pernicious, sexist effects."

For example, machine-learning programs can translate foreign languages into sentences taht reflect or reinforce gender stereotypes. Turkish uses a gender-neutral, third person pronoun, "o." Plugged into the online translation service Google Translate, however, the Turkish sentence "o bir doktor" and "o bir hemşire" are translated into English as "he is a doctor" and "she is a nurse." . . . . "The Biases that we studied in the paper are easy to overlook when designers are creating systems," Narayanan said. (Source: Princeton University, "Biased Bots" by Adam Hadhazy.)

Yewno is exactly such a system insofar as it mimics human forms of life which include, alas, the reinforcement of biases and prejudice. So in the end, do you know Yewno, and if Yewno, exactly what do you know? --that "exactly what" will likely contain machine-generated replications of problematic human biases. Machine translations will never offer perfect, complete translations of languages because language is never complete --humans will always use it new ways, with new shades of meaning and connotations of plausibility, because human go on living in their innumerable, linguistic forms of life. Machines have to map language within language (here I include mathematics as kinds of languages with distinctive games and forms of life). No "Knowledge Singularity" can occur outside of language, because it will be made of language: but the ideology of "Singularity" can conceal its origins in many forms of life, and thus appear "natural," "inevitable," and "unstoppable."

The "Knowledge Singularity" will calcify bias and injustice in an everlasting status quo unless humans, no matter how comparatively deficient, resolve that knowledge is not a philosophical problem to be solved (such as in Karl Popper's Worlds 1, 2, and 3), but a puzzle to be wrestled with and contested in many human forms of life and language (Wittgenstein). Only by addressing human forms of life can we ever address the greater silence and the Life that we have lost in living. What we cannot speak about, we must pass over in silence (Wovon man nicht sprechen kann, darüber muss man schweigen, sentence 7 of the Tractatus) --and that silence, contra both the positivist Vienna Circle and Karl Popper (who was never part of it) is the most important part of human living. In the Tractatus Wittengenstein dreamt, as it were, a conclusive solution to the puzzle of language --but such a solution can only be found in the silence beyond strict logical (or machine) forms: a silence of the religious quest beyond the ethical dilemma (Kierkegaard).

This journey through my "undiscovered summer reading," from the Antikythera mechanism to the alleged "Knowledge Singularity," has reinforced my daily, functional belief that knowing is truly something that humans do within language and through language, and that the quest which makes human life human is careful attention to the forms of human life, and the way that language, mathematics, and silence are woven into and through those forms. The techno-solutionism inherent in educational technology and library information technology --no matter how sophisticated-- cannot undo the basic puzzle of human life: how do we individually and social find the world? (Find: in the sense of locating, of discovering, and of characterizing.) Yewno will not lead to a Knowledge Singularity, but to derived bias and reproduced injustice, unless we acknowledge its limitations within language.

The promise of educational and information technology becomes more powerful when approached with modesty: there are no quick, technological solutions to puzzles of education, of finance, of information discovery, of "undiscovered public knowledge." What those of us who are existentially involved with the much-maligned, greatly misunderstood, and routinely dismissed "liberal arts" can contribute is exactly what makes those technologies humane: a sense of modesty, proportion, generosity, and silence. Even to remember those at this present moment is a profoundly counter-cultural act, a resistance of the techno-idology of unconscious bias and entrenched injustice.

Is Undiscovered Public Knowledge A Problem or a Puzzle?

"Undiscovered public knowledge" seems an oxymoron. If "public" than why "undiscovered" --means the knowledge that once was known by someone, recorded, properly interred in some documentary vault, and left unexamined.

Gavin FerribyJune 19, 2017

(This is the first of three posts about my semi-serendipitous summer reading; here are links to posts two and three.)

This last week I was seized by a strange mania: clean the office. I have been in my current desk and office since 2011 (when a major renovation disrupted it for some months). It was time to clean --spurred by notice that boxes of papers would be picked up for the annual certified, assured shredding. I realized I had piles of FERPA-protected paperwork (exams, papers, 1-1 office hours memos, you name it). Worse: my predecessor had left me large files that I hadn't look at in seven years, and that contained legal papers, employee annual performance reviews, old resumes, consultant reports, accreditation documentation, etc. Time for it all to go! I collected six large official boxes (each twice the size of a paper ream), but didn't stop there: I also cleaned the desk; cleaned up the desktop; recycle odd electronic items, batteries, and lightbulbs; forwarded a very large number of vendor advertising pens to cache for our library users ("do you have a pen?"). On Thursday I was left with the moment-after: I cleared it all out: now what?

The "what" turned out to be various articles I had collected and printed for later reading, and then never actually read --some more recent, some a little older. (This doesn't count the articles I recycled as no longer relevant or particularly interesting; my office is not a bibliography in itself.) Unintentionally, several of these articles wove together concerns that have been growing in the back of my mind --and have been greatly pushed forward with the events of the past year (Orlando--Bernie Sanders--the CombOver--British M.P. Jo Cox--seem as distant and similar as events of the late Roman republic now, pace Mary Beard.)

"Undiscovered public knowledge" seems an oxymoron (but less one than "Attorney General Jeff Sessions"). If "public" than why "undiscovered"? It means the knowledge that once was known by someone, recorded, properly interred in some documentary vault, and left unexamined and undiscovered by anyone else. The expression is used in Adrienne LaFrance's Searching for Lost Knowledge in the Age of Intelligent Machines, published in The Atlantic, December 1, 2016. Her leading example is the fascinating story of the Antikythera mechanism, some sort of ancient time-piece surfaced from an ancient, submerged wreck off Antikythera (a Greek island between the Peloponnese and Crete, known also as Aigila or Ogylos). It sat in the crate outside the National Archaeological Museum in Athens for a year, and then was largely forgotten by all but a few dogged researchers, who pressed on for decades with the attempt to figure out exactly what it is.

The Antikythera mechanism has only come to be understood when widely separated knowledge has been combined by luck, persistence, intuition, and conjecture. How did such an ancient time piece come about, who made it, based upon which thinking, from where? It could not have been a one-off, but it seems to be a unique lucky find from the ancient world, unless other mechanisms or pieces are located elsewhere in undescribed or poorly described collections. For example, a 10th-century Arabic manuscript suggests that such a mechanism may have influenced the development of modern clocks, and in turn built upon ancient Babylonian astronomical data. (For more see Josephine Marchant's Decoding the heavens : a 2,000-year-old computer--and the century-long search to discover its secrets, Cambridge, Mass.: DaCapo Press, 2009: Worldcat ; Sacred Heart University Library). Is there "undiscovered public knowledge" that would include other mechanisms, other clues to its identity, construction, development, and influence?

"Undiscovered public knowledge" is a phrase made modestly famous by Don R. Swanson in an article by the same name in The Library Quarterly, 1986. This interesting article is a great example of the way that library knowledge and practice tends to become isolated in the library silo, when it might have benefited many others located elsewhere. (It is also a testimony to the significant, short-sighted mistake made by the University of Chicago, Columbia University, and others, in closing their library science programs in the 1980s-1990s just when such knowledge was going public in Yahoo, Google, Amazon, GPS applications and countless other developments.) Swanson's point is that "independently created fragments are logically related but never retrieved, brought together, and interpreted." The "essential incompleteness" of search (or now: discovery) makes "possible and plausible the existence of undiscovered public knowledge." (to quote the abstract --the article is highly relevant and well developed). Where Swanson runs into trouble, however, is his use of Karl Popper's distinction between subjective and objective knowledge, the critical approach within science that distinguishes between "World 2" and "World 3." (Popper's Three Worlds (.pdf), lectures at the University of Michigan in 1978, were a favorite of several of my professors at Columbia University School of Library Service; Swanson's article in turn was published and widely read while I was studying there.)

Popper's critical worlds (1: physical objects and events, including biological; 2: mental objects and events; 3: objective knowledge, a human but not Platonic zone) both enable the deep structures of information science as now practiced by our digital overlords as well and signal their fatal flaw. They do this (enable the deep structures and algorithms of "discovery") by assuming the link between physical objects and events, mental objects, and objective knowledge symbolically notated (language, mathematics). Simultaneously Popper's linkage also signals their fatal flaw: such language (and mathematics) is or are used part-and-parcel in innumerable forms of human life and their languages "games," where the link between physical objects, mental objects, and so-called objective knowledge is puzzling, in addition to a never-ending source of philosophical delusion.

To sum up: Google thinks its algorithm is serving up discoveries of objective realities, when it is really extending the form of life called "algorithm" --no "mere" here, but in fact an ideological extension of language that conceals its power relations and manufactures the assumed sense that such discovery is "natural." It is au contraire a highly developed, very human form of life parallel to, and participating in, innumerable other forms of life, and just as subject to their foibles, delusions, illogic, and mistakes as any other linguistic form of life. There is no "merely" (so-called "nothing-buttery") to Google's ideological extension: it is very powerful and seems, at the moment, to rule the world. Like every delusion, however, it could fall "suddenly and inexplicably," like an algorithmic Berlin Wall, and "no one could have seen it coming" --because of the magnificent illusion of ideology (as in the Berlin Wall, ideology on both sides, as well, upheld by both the CIA and the KGB).

This is once again to rehearse the crucial difference between Popper's and Wittgenstein's understandings of science and knowledge. A highly relevant text is the lucid, short Wittgenstein's Poker: The Story of a Ten-Minute Argument Between Two Great Philosophers, (by David Edmonds and John Eidinow, Harper Collins, 2001; Worldcat). Wittgenstein: if we can understand the way language works from within language (our only vantage point), most philosophical problems will disappear, and we are left with puzzles and mis-understandings that arise when we use improperly the logic of our language. Popper: Serious philosophical problems exist with real-world consequences, and a focus upon language only "cleans its spectacles" to enable the wearer to see the world more clearly. (The metaphor is approximately Popper's; this quick summary will undoubtedly displease informed philosophers, and I beg their forgiveness, for the sake of brevity.)

For Wittgenstein, if I may boldly speculate, Google would only render a reflection of ourselves, our puzzles, mis-understandings, and mistakes. Example: search "white girls," then clear the browser of its cookies (this is important), and search "black girls." Behold the racial bias. The difference in Google's search results points to machine-reproduced racism that would not have surprised Wittgenstein, but seems foreign to the Popper's three worlds. Google aspires to Popper's claims of objectivity, but behaves very differently --at least, its algorithm does. No wonder its algorithm has taken on the aura of an ancient deity: it serves weal and woe without concern for the fortunes of dependent mortals. Except . . . it's a human construct.

So, Swanson's article identifies and makes plausible "undiscovered public knowledge" because of the logical and essential incompleteness of discovery (what he called "search"): discovery signals a wide variety of human forms of life, and no algorithm can really anticipate them. The Antikythera mechanism, far from an odd example, is a pregnant metaphor for the poignant frailties of human knowledge and humans' drive to push past their limits. Like the Archimedes palimpsest, "undiscovered public knowledge" is one of the elements that makes human life human --without which we become, like the Q Continuum in Star Trek: Next Generation, merely idle god-like creatures of whim and no moral gravitas whatsoever. The frailty of knowledge --the it is made up of innumerable forms of human life, which have to be lived by humans rather than algorithms-- gives the human drive to know its edge, and its tragedy. A tragic sense of life, however, is antithetical to the tech-solutionist ideology of the algorithm.

(Continued in the second post, Undiscovered Summer Reading)

Tag: algorithm

Do You Know Yewno, and If Yewno, Exactly What Do You Know?

Is Undiscovered Public Knowledge A Problem or a Puzzle?

Interesting Blogs