Revealing Algorithmic Rankers


algorithmic-rankerBy Julia Stoyanovich (Assistant Professor of Computer Science, Drexel University) and Ellen P. Goodman (Professor, Rutgers Law School).  This post is derived from their recent Freedom to Tinker post.

ProPublica’s story on “machine bias” in an algorithm used for sentencing defendants amplified calls to make algorithms more transparent and accountable.  It has never been more clear that algorithms are political (Gillespie) and embody contested choices (Crawford), and that these choices are largely obscured from public scrutiny (Pasquale and Citron).  We see it in controversies over Facebook’s newsfeed, or Google’s search results, or Twitter’s trending topics.  Policymakers are considering how to operationalize “algorithmic ethics” and scholars are calling for accountable algorithms (Kroll, et al.).

One kind of algorithm that is at once especially obscure, powerful, and common is the ranking algorithm (Diakopoulos). ­­­­­­­­­­­­­­­­­­­­­­­­­­­­ Algorithms rank individuals to determine credit worthiness, desirability for college admissions and employment, and compatibility as dating partners. They rank countries and companies for sustainability, human rights, transparency and freedom of expression. They encode norms for what counts as the best schools, neighborhoods, societies, businesses, and technologies. Despite their importance, we often know very little about why high-rankers are coming out on top.  Stakeholders are in the dark: those who are ranked, those who use the rankings, and the public whose world the rankings may shape.

Many rankers, such as Google’s page rank, do not disclose what precisely they are seeking to measure or what methods they use to do it.  Rankers justify this kind of intentional opacity as a defense against manipulation and gaming.

Some rankers partially reveal the logic of their algorithms by disclosing factors and relative weights.  An example is the US News ranking of colleges.  These rankers engage in what we might call syntactic transparency.  But even with this degree of transparency, there remain significant degrees of opacity, as explained below.  In cases where syntactic transparency is considered impossible, and even where it exists, we advocate for an alternative goal of interpretability, which rests on making explicit the interactions between the program and the data on which it acts. An interpretable algorithm allows stakeholders to understand the outcomes, not merely the process by which outcomes were produced.

Opacity in Algorithmic Rankers

The simplest kind of a ranker is a score-based ranker, which applies a scoring function independently to each item and then sorts the items on their scores.  These rankers can produce relatively opaque results for the following reasons.

Source 1: The scoring formula alone does not indicate the relative rank of an item. Rankings are, by definition, relative, while scores are absolute. Knowing how the score of an item is computed says little about the outcome — the position of a particular item in the ranking, relative to other items.  Is 10.5 a high score or a low score?  That depends on how 10.5 compares to the scores of other items, for example to the highest attainable score and to the highest score of some actual item in the input.

Source 2: The weight of an attribute in the scoring formula does not determine its impact on the outcome.  For example, consider the ranking of academic programs that weighs faculty size, average publication count, and GREs, among other factors.  The algorithm might allocate least weight to faculty size, and even disclose that weight, but that factor could end up being the deciding factor that sets apart top-ranked departments from those in lower ranks.  This would happen if that factor were the most variable or was correlated to other factors that enhanced its weight.  In other words, what actually turns out to be important may not be what syntactic transparency would reveal.

Source 3: The ranking output may be unstable.  A ranking may be unstable because of the scores generated on a particular dataset. An example would be tied scores, where the tie is not reflected in the ranking.  Syntactic transparency and accessible data allow us to see the instability, but this is unusual.

Source 4: The ranking methodology may be unstable. The scoring function may produce vastly different rankings with small changes in attribute weights.  This is difficult to detect even with syntactic transparency, and even if the data is public.  Malcolm Gladwell discusses this issue and gives compelling examples in his 2011 piece, The Order of Things.

The opacity concerns described here are all due to the interaction between the scoring formula (or, more generally, an a priori postulated model) and the actual dataset being ranked.  In a recent paper, one of us observed that structured datasets show rich correlations between item attributes in the presence of ranking, and that such correlations are often local (i.e., are present in some parts of the dataset but not in others).  To be clear, this kind of opacity is present whether or not there is syntactic transparency.


Recent scholarship on the issue of algorithmic accountability has devalued transparency in favor of verification.  The claim is that because algorithmic processes are protean and extremely complex (due to machine learning) or secret (due to trade secrets or privacy concerns), we need to rely on retrospective checks to ensure that the algorithm is performing as promised.  Among these checks would be cryptographic techniques like zero knowledge proofs (Kroll, et al.) to confirm particular features, audits (Sandvig) to assess performance, or reverse engineering (Perel & Elkin-Koren) to test cases.

These are valid methods of interrogation, but we do not want to give up on disclosure. Retrospective testing puts a significant burden on users.  Proofs are useful only when you know what you are looking for.  Reverse engineering with test cases can lead to confirmation bias. All these techniques put the burden of inquiry exclusively on individuals for whom interrogation may be expensive and ultimately fruitless.  The burden instead should fall more squarely on the least cost avoider, which will be the vendor who is in a better position to reveal how the algorithm works (even if only partially).  What if food manufacturers resisted disclosing ingredients or nutritional values, and instead we were put to the trouble of testing their products or asking them to prove the absence of a substance?  That kind of disclosure by verification is very different from having a nutritional label.

What would it take to provide the equivalent of a nutritional label for the process and the outputs of algorithmic rankers?  What suffices as an appropriate and feasible explanation depends on the target audience.

For an individual being ranked, a useful description would explain his specific ranked outcome and suggest ways to improve the outcome.  What attributes turned out to be most important to the individual’s ranking?  When working with data that is not public (e.g., involving credit or medical information about individuals), an explanation mechanism must be mindful of any privacy considerations.  Individually-responsive disclosures could be offered in a widget that allows ranked entities to experiment with the results by changing the inputs.

An individual consumer of a ranked output would benefit from a concise and intuitive description of the properties of the ranking. Based on this explanation, users will get a glimpse of, e.g., the diversity (or lack thereof) that the ranking exhibits in terms of attribute values.  Both attributes that comprise the scoring function, if known (or, more generally, features that make part of the model), and attributes that co-occur or even correlate with the scoring attributes, can be described explicitly.  Let’s again consider a ranking of academic departments in a field that places a huge emphasis on faculty size.  We might want to understand how a ranking on average publication count will over-represent large departments (with large faculties) at the top of the list, while GRE does not strongly influence rank.



Figure 1: A hypothetical Ranking Facts label.

Figure 1 presents a hypothetical “nutritional label” for rankings.  Inspired by Nutrition Facts, our Ranking Facts label is aimed at the consumer, such as a prospective program applicant, and addresses three of the four opacity sources described above: relativity, impact, and output stability.  We do not address methodological stability in the label.  How this dimension should be quantified and presented to the user is an open technical problem.

The Ranking Facts show how the properties of the 10 highest-ranked items compare to the entire dataset (Relativity), making explicit cases where the ranges of values, and the median value, are different at the top-10 vs. overall (median is marked with red triangles for faculty size and average publication count).   The label lists the attributes that have most impact on the ranking (Impact), presents the scoring formula (if known), and explains which attributes correlate with the computed score.  Finally, the label graphically shows the distribution of scores (Stability), explaining that scores differ significantly up to top-10 but are nearly indistinguishable in later positions.

Something like the Rankings Facts makes the process and outcome of algorithmic ranking interpretable for consumers, and reduces the likelihood of opacity harms, discussed above.  Beyond Ranking Facts, it is important to develop Interpretability tools that enable vendors to design fair, meaningful and stable ranking processes, and that support external auditing.   Promising technical directions include, e.g., quantifying the influence of various features on the outcome under different assumptions about availability of data and code, and investigating whether provenance techniques can be used to generate explanations.

Posted in News Tagged with: , , ,

Pleading Standards: The Hidden Threat to Actavis

In FTC v. Actavis, the Supreme Court issued one of the most important antitrust decisions in the modern era. It held that a brand drug company’s payment to a generic firm to settle patent litigation and delay entering the market could violate the antitrust laws.

Since the decision, courts have analyzed several issues, including causation, the role of the patent merits, and whether “payment” is limited to cash. But one issue — the pleading requirements imposed on plaintiffs — has slipped under the radar. This issue has the potential to undercut antitrust law, particularly because settlements with payment and delayed entry today typically do not take the form of cash. In my latest piece, I explore these issues.

Several courts have imposed unprecedented hurdles. For example, the district court in In re Effexor XR Antitrust Litigation failed to credit allegations that a generic delayed entering the market because a brand promised not to introduce its own “authorized generic” that would have dramatically reduced the true generic’s revenues. The same judge, in In re Lipitor Antitrust Litigation, dismissed a complaint despite allegations that the generic delayed entry in return for the brand’s forgiveness of hundreds of millions of dollars in potential damages in separate litigation.

This essay first introduces the Supreme Court’s Actavis decision. It then discusses the pleading standards articulated by the Court in Bell Atlantic v. Twombly and Ashcroft v. Iqbal. Turning to the cases that applied excessively high pleading requirements, it next focuses on the Effexor and Lipitor cases. Finally, it analyzes the settlement cases that applied a more justifiable analysis.

The essay concludes that the imposition of excessive standards, as was done by the Effexor and Lipitor courts, threatens to overturn established pleading standards and undercut the landmark Actavis decision. Such a result would significantly weaken the antitrust analysis of potentially anticompetitive settlements.

Posted in News

Why a ‘Large and Unjustified’ Payment Threshold is Not Consistent with Actavis

FTC v. Actavis was a landmark antitrust decision. In rejecting the “scope of the patent” test that had immunized settlements by which brand-name drug firms pay generic companies to delay entering the market (“exclusion payment settlements”), the Supreme Court made clear that such agreements “tend to have significant adverse effects on competition” and could violate the antitrust laws.

Some lower courts and defendants have sought to sow ambiguity in the post-Actavis caselaw by creating new thresholds and frameworks not articulated or envisioned by the Court. In particular, they have latched onto the discussion in Actavis of a “large and unjustified” payment. The district court in In re Loestrin 24 FE Antitrust Litigation, for example, imposed a framework that required analysis of (1) whether “there [is] a reverse payment” and (2) whether “that reverse payment is large and unjustified” before addressing (3) the rule of reason. The Loestrin court borrowed this framework from the district court in In re Lamictal Direct Purchaser Antitrust Litigation. And defendants have contended, for example, that “Actavis requires a plaintiff challenging a reverse-payment settlement . . . to prove, as a threshold matter, that the . . . payment was both large and unjustified” and that “under Actavis, [plaintiffs] have to prove that [a] payment was ‘large’ (as well as unexplained).”

This article offers three reasons why a requirement that a plaintiff demonstrate a large and unjustified payment before reaching the Rule of Reason is not consistent with Actavis. First, nearly all of the Court’s discussion of large and unjustified payments occurred in contexts having little to do with the antitrust analysis that future courts were to apply. Second, the Court instructed lower courts to apply the Rule of Reason, not a new framework with a threshold it never mentioned. And third, such a threshold is inconsistent with the Court’s (1) allowance of shortcuts for plaintiffs to show anticompetitive effects and market power and (2) imposition of the burden on defendants to show justifications for a payment.

Posted in News

Cyber Law and Policy: A Framework for the 21st Century

Posted in Events, News

RIIPL Tweets