bias – The Extensible Librarian

This is the third of "undiscovered summer reading" posts, see also the first and second.

At the recent Association of College and Research Libraries conference Baltimore I came across Yewno, a search-engine-like discovery or exploration layer that I had heard about. I suspect that Yewno or something like it could be the "next big thing" in library and research services. I have served as a librarian long enough both to be very interest, and to be wary at the same time --so many promises have been made by the information technology commercial sector and the reality fallen far short --remember the hype about discovery services?

Yewno is a so-called search app; it "resembles as search engines --you use it to search for information, after all--but its structure is network-like rather than list-based, the way Google's is. The idea is to return search results that illustrate relationships between relevant sources" --mapping them out graphically (like a mind map). Those words are quoted from Adrienne LaFrance's Atlantic article on growing understanding of the Antikythera mechanism as an example of computer-assisted associative thinking (see, all these readings really come together). LaFrance traces the historical connections between "undiscovered public knowledge," Vannevar Bush's Memex (machine) in the epochal As We May Think, and Yewno. The hope is that through use of an application such as Yewno, associations could be traced between ancient time-keeping, Babylonian and Arabic mathematics, medieval calendars, astronomy, astrological studies, ancient languages, and other realms of knowledge. At any rate, that's the big idea, and it's a good one.

So who is Yewno meant for, a what's it based on?

Lafrance notes that Yewno "was built primarily for academic researchers," but I'm not sure that's true, strictly. When I visited the Yewno booth at ACRL, I thought several things at once: 1) this could be very cool; 2) this could actually be useful; 3) this is going to be expensive (though I have neither requested nor received a quote); and 4) someone will buy them, probably Google or another technology octopus. (Subsequent thought: where's Google's version of this?) I also thought that intelligence services and corporate intelligence advisory firms would be very, very interested --and indeed they are. Several weeks later I read Alice Meadows' post, "Do You Know About Yewno?" on the Scholarly Kitchen blog, and her comments put Yewno in clearer context. (Had I access to Yewno, I would have searched, "yewno.")

Yewno is a start-up venture by Ruggero Gramatica (if you're unclear, that's a person), a research strategist with a background in applied mathematics (Ph.D. King's College, London) and M.B.A. (University of Chicago). He is first-named author of "Graph Theory Enables Drug Repurposing," a paper (DOI) on PLOS One that introduces:

a methodology to efficiently exploit natural-language expressed biomedical knowledge for repurposing existing drugs towards diseases for which they were not initially intended. Leveraging on developments in Computational Linguistics and Graph Theory, a methodology is defined to build a graph representation of knowledge, which is automatically analysed to discover hidden relations between any drug and any disease: these relations are specific paths among the biomedical entities of the graph, representing possible Modes of Action for any given pharmacological compound. We propose a measure for the likeliness of these paths based on a stochastic process on the graph.

Yewno does the same thing in other contexts:

an inference and discovery engine that has applications in a variety of fields such as financial, economics, biotech, legal, education and general knowledge search. Yewno offers an analytics capability that delivers better information and faster by ingesting a broad set of public and private data sources and, using its unique framework, finds inferences and connections. Yewno leverages on leading edge computational semantics, graph theoretical models as well as quantitative analytics to hunt for emerging signals across domains of unstructured data sources. (source: Ruggero Gramatica's LinkedIn profile)

This leads to several versions of Yewno: Yewno Discover, Yewno Finance, Yewno Life Sciences, and Yewno Unearth. Ruth Pickering, the companies co-founder and CEO of Business Development & Strategy Officer, comments, "each vertical uses a specific set of ad-hoc machine learning based algorithms and content. The Yewno Unearth product sits across all verticals and can be applied to any content set in any domain of information." Don't bother calling the NSA --they already know all about it (and probably use it, as well).

Yewno Unearth is relevant to multiple functions of publishing: portfolio categorization, the ability to spot gaps in content, audience selection, editorial oversight and description, and other purposes for improving a publisher's position, both intellectually and in the information marketplace. So Yewno Discovery is helpful for academics and researchers, but the whole of Yewno is also designed to relay more information about them to their editors, publishers, funders, and those who will in turn market publications to their libraries. Elsevier, Ebsco, and ProQuest will undoubtedly appear soon in librarians' offices with Yewno-derived information, and that encounter likely could prove to be truly intimidating. So Yewno might be a very good thing for a library, but not simply an unalloyed very good thing.

So what is Yewno really based on? The going gets more interesting.

Meadows notes that Yewno's underlying theory emerged from the field of complex systems at the foundational level of econophysics, an inquiry "aimed at describing economic and financial cycles utilized mathematical structures derived from physics." The mathematical framework, involving uncertainty, stochastic (random probability distribution) processes and nonlinear dynamics, came to be applied to biology and drug discovery (hello, Big Pharma). This kind of information processing is described in detail in a review article, Deep Learning in Nature (Vol. 521, 28 May 2015, doi10.1038/nature14539). Developing machine learning, deep learning "allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction." Such deep learning "discovers intricate structure in are data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer." Such "deep convolutional nets" have brought about significant break-throughs when processing images, video, speech, and "recurrent nets" have brought new learning powers to "sequential data such as text and speech."

The article goes on in great detail, and I do not pretend I understand very much of it. Its discussion of recurrent neural networks (RNNs), however, is highly pertinent to libraries and discovery. The backpropagational algorithm is basically a process that adjusts the weights used in machine analysis while that analysis is taking place. For example, RNNs "have been found to be very good at predicting the next character in the text, or next word in a sequence," and by such backpropagational adjustments, machine language translations have achieved greater levels of accuracy. (But why not complete accuracy? --read on.) The process "is more compatible with the view that everyday reasoning involves many simultaneous analogies that each contribute plausibility to a conclusion." In their review's conclusion, the authors expect "systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time."

After all this, what do you know? Yewno presents the results of deep learning through recurrent neural networks that identify nonlinear concepts in a text, a kind of "knowledge." Hence Ruth Pickering can plausibly state:

Yewno's mission is "Knowledge Singularity" and by that we mean the day when knowledge, not information, is at everyone's fingertips. In the search and discovery space the problems that people face today are the overwhelming volume of information and the fact that sources are fragmented and dispersed. There' a great T.S. Eliot quote, "Where's the knowledge we lost in information" and that sums up the problem perfectly. (source: Meadows' post)

Ms. Pickering perhaps revealed more than she intended. Her quotation from T.S. Eliot is found in a much larger and quite different context:

Endless invention, endless experiment,
Brings knowledge of motion, but not of stillness;
Knowledge of speech, but not of silence;
Knowledge of words, and ignorance of the Word.
All our knowledge brings us nearer to our ignorance,
All our ignorance brings us nearer to death,
But nearness to death no nearer to GOD.
Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
The cycles of Heaven in twenty centuries
Bring us farther from GOD and nearer to the Dust. (Choruses from The Rock)

Eliot's interest is in the Life we have lost in living, and his religious and literary use of the word "knowledge" signals the puzzle at the very base of econophysics, machine learning, deep learning, and backpropagational algorithms. Deep learning performed by machines mimics what humans do, their forms of life. Pickering's "Knowledge Singularity" alludes to the semi-theological vision of the Ray Kurzweil's millennialist "Singularity;" a machine intelligence infinitely more powerful than all human intelligence combined. In other words, where Eliot is ultimately concerned with Wisdom, the Knowledge Singularity is ultimately concerned with Power. Power in the end means power over other people: otherwise it has no social meaning apart from simply more computing. Wisdom interrogates power, and questions its ideological supremacy.

For example, three researchers at the Center for Information Technology Policy at Princeton University have shown that "applying machine learning to ordinary human language results in human-like semantic biases." ("Semantics derived automatically from language corpora contain human-like biases," Science 14 April 2017, Vol. 356, issue 6334: 183-186, doi 10.1126/science.aal4230). The results of their replication of a spectrum of know biases (measured by the Implicit Association Test) "indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as race and gender, for even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Their approach holds "promise for identifying and addressing sources of bias in culture, including technology." The authors laconically conclude, "caution must be used in incorporating modules constructed via unsupervised machine learning into decision-making systems." Power resides in decisions such decisions about other people, resources, and time.

Arvind Narayanan, who published the paper with Aylin Caliskan and Joanna J. Bryson, noted that "we have a situation where these artificial-intelligence systems may be perpetuating historical patterns of bias that we might find socially unacceptable and which we might be trying to move away from." Princeton researchers developed an experiment with a program called GloVe that replicated the Implicit Association test in machine-learning representation of co-occurent words and phrases. Researchers at Stanford turn this loose on roughtly 840 billion words from the Web, and looked for co-occurences and associations of words such as "man, male" or "woman, female" with "programmer engineer scientist, nurse teacher, librarian." They showed familiar biases in distributions of associations, biases that can "end up having pernicious, sexist effects."

For example, machine-learning programs can translate foreign languages into sentences taht reflect or reinforce gender stereotypes. Turkish uses a gender-neutral, third person pronoun, "o." Plugged into the online translation service Google Translate, however, the Turkish sentence "o bir doktor" and "o bir hemşire" are translated into English as "he is a doctor" and "she is a nurse." . . . . "The Biases that we studied in the paper are easy to overlook when designers are creating systems," Narayanan said. (Source: Princeton University, "Biased Bots" by Adam Hadhazy.)

Yewno is exactly such a system insofar as it mimics human forms of life which include, alas, the reinforcement of biases and prejudice. So in the end, do you know Yewno, and if Yewno, exactly what do you know? --that "exactly what" will likely contain machine-generated replications of problematic human biases. Machine translations will never offer perfect, complete translations of languages because language is never complete --humans will always use it new ways, with new shades of meaning and connotations of plausibility, because human go on living in their innumerable, linguistic forms of life. Machines have to map language within language (here I include mathematics as kinds of languages with distinctive games and forms of life). No "Knowledge Singularity" can occur outside of language, because it will be made of language: but the ideology of "Singularity" can conceal its origins in many forms of life, and thus appear "natural," "inevitable," and "unstoppable."

The "Knowledge Singularity" will calcify bias and injustice in an everlasting status quo unless humans, no matter how comparatively deficient, resolve that knowledge is not a philosophical problem to be solved (such as in Karl Popper's Worlds 1, 2, and 3), but a puzzle to be wrestled with and contested in many human forms of life and language (Wittgenstein). Only by addressing human forms of life can we ever address the greater silence and the Life that we have lost in living. What we cannot speak about, we must pass over in silence (Wovon man nicht sprechen kann, darüber muss man schweigen, sentence 7 of the Tractatus) --and that silence, contra both the positivist Vienna Circle and Karl Popper (who was never part of it) is the most important part of human living. In the Tractatus Wittengenstein dreamt, as it were, a conclusive solution to the puzzle of language --but such a solution can only be found in the silence beyond strict logical (or machine) forms: a silence of the religious quest beyond the ethical dilemma (Kierkegaard).

This journey through my "undiscovered summer reading," from the Antikythera mechanism to the alleged "Knowledge Singularity," has reinforced my daily, functional belief that knowing is truly something that humans do within language and through language, and that the quest which makes human life human is careful attention to the forms of human life, and the way that language, mathematics, and silence are woven into and through those forms. The techno-solutionism inherent in educational technology and library information technology --no matter how sophisticated-- cannot undo the basic puzzle of human life: how do we individually and social find the world? (Find: in the sense of locating, of discovering, and of characterizing.) Yewno will not lead to a Knowledge Singularity, but to derived bias and reproduced injustice, unless we acknowledge its limitations within language.

The promise of educational and information technology becomes more powerful when approached with modesty: there are no quick, technological solutions to puzzles of education, of finance, of information discovery, of "undiscovered public knowledge." What those of us who are existentially involved with the much-maligned, greatly misunderstood, and routinely dismissed "liberal arts" can contribute is exactly what makes those technologies humane: a sense of modesty, proportion, generosity, and silence. Even to remember those at this present moment is a profoundly counter-cultural act, a resistance of the techno-idology of unconscious bias and entrenched injustice.

Tag: bias

Do You Know Yewno, and If Yewno, Exactly What Do You Know?

Interesting Blogs