datasets | Philippine Law Bytes: TheCyberLawyer Issues by Dr. Attorney Noel Guivani Ramiscal

In 2018, several researchers/scientists (Michael Benedict L. Virtucio, Jeffrey A. Aborot, John Kevin C. Abonita, Roxanne S. Avinante, Rother Jay B. Copino, Michelle P. Neverida, Vanesa O. Osiana, Elmer C. Peramo, Joanna G. Syjuco and Glenn Brian A. Tan), hereinafter, the authors, from the Advanced Science and Technology Institute (ASTI), Department of Science and Technology (DOST), published a paper that should have made, at the very least, ripples in the Philippine judiciary and legal profession, but did not.

I came across the paper’s abstract and title while updating my “Cyber Law Ethics” and Judicial Ethics and Competence in Cyberspace” course modules in 2019. I hunted down a free copy of the paper online but could not find any, so I contacted two of the authors and several people from ASTI in February 2020, and Mr. Aborot promptly responded. I first discussed the ramifications of this paper in my February 22, 2020 and March 5, 2020 lectures for the UPIAJ Mandatory Continuing Legal Education (MCLE) seminars for lawyers at UP Diliman.

Since I am not aware of any written critique of this significant paper, I have decided to gather my thoughts and publish this as a way of enriching the knowledge base on this matter and opening new vistas for discussion, and further the development of work on this area.

Disclosure: I emailed the DOST ASTI people certain questions, several times prior to my February 22, 2020 and May 5, 2020 lectures, to clarify certain matters, but none of them responded back.

Published in the 2018 42nd IEEE International Conference on Computer Software and Applications, the paper breaks new ground as the first scientific paper that subjected selected available online decisions of the Philippine Supreme Court to a textual analysis using the Natural Language Processing (NLP) plus Machine Learning (ML)-based approach to supposedly predict the outcome of criminal cases that are appealed to the Supreme Court. The authors scraped from the Chan Robles Virtual Law Library (chanrobles.com) and the Lawphil Project (lawphil.net) online Supreme Court case decisions from 1987 to 2017 and narrowed their study to 6,483 cases. The authors created “datasets” which was termed “bag-of-words” model to analyze the content of the selected cases which were classified to crimes against persons, property, public order and drugs. The goal of the authors is to point to a solution in the reduction of case backlog in Philippine courts. Per the statistics presented by the authors, the Philippines has an average of 1 million annual cases filed per year, and 4000 cases per court per day are filed!

The articulated purpose of the study can be found in the statement that “(d)eciding on cases is a complicated and time-consuming task. It requires the court staff to sift through various records to identify supporting statements for any possible case outcomes. The possible outcome with the strongest support based on the statements in a case decision will be deﬁned as the predicted outcome of the case.”

The use of algorithms, predictive coding software and analytics tools in the legal and judicial professions have been going on for quite some time. In Australia, several software products are used to help mediators and judges decide the division of marital assets of separated or divorcing spouses. The U.S., several types of software are used by law firms for e-discovery. In my 2019 tour of Rio De Janeiro and Sao Paolo, Brazil, I found out that Brazilian court judges have used certain software tools to help them decide certain cases like traffic collisions.

Dr. Atty. Noel G. Ramiscal in Sao Paolo, Brazil, 2019

Dr. Atty. Noel G. Ramiscal in Selaron Steps, Lapa, Rio de Janeiro, Brazil, 2019

Critique: Necessity & Utility

As a former Executive Assistant for one of Justices of the Court of Appeals, who later became a Supreme Court Justice, I completely appreciate the noble motive and the immense work done by the authors in this study. At the outset, the approach taken by the authors is not really novel.

The textual approach, though backed by science, follows an intuitive approach used by legal researchers and drafters of decisions. Way back in the 1990s, I participated in the Philippine Jurisprudence Program of the Philippine Supreme Court for all court researchers. We were trained to use and query a computer database containing all the Supreme Court decisions at the time. We inputted words or phrases in the database search engine relevant to our research and pertinent cases were presented to us in the form of an index containing the search words. If we look at the cases individually, we could find out if the reliefs prayed for were granted or denied, or if the lower court’s ruling was affirmed or reversed. This is the same mode of research that one can do using the lawphil.net, Chan Robles, and even the Supreme Court websites. There are software products available now that basically works in the same way.

Since jurisprudence is defined mostly by past precedents or stare decisis, lawyers and judges can rely on past cases to support their arguments and decisions. Often, one will discover strings of cases containing similar facts and law, decided by the Supreme Court similarly. It is not difficult to spot these cases and the Supreme Court would even cite in these cases, previous cases it had relied to come up with its ruling.

Dr. Atty, Noel G. Ramiscal’s February 22, 2020 MCLE lecture where he discussed his critique of the DOST ASTI paper on predicting Supreme Court decisions

The NLP approach used by the DOST ASTI people, would probably make the research faster. But strangely enough, the study itself did not contain any data that compared the time that an ordinary Philippine court researcher takes to research cases on a specific court case s/he is working on, in contrast with the NLP/machine learning research process of the DOST ASTI people. A perusal of the paper itself did not factor actual research speed as a specific variable into the study, which makes one wonder, about the paper’s utility.

To me, as a legal researcher, litigator and advocate of diverse rights and interests, what would be most useful is to come up with an algorithm that can guide judges, researchers and advocates of possible fair outcomes in cases where the facts and the issues they present are quite novel, or even in cases when the facts and issues are familiar, but following judicial precedent would bring injustice due to the changed circumstances and mores of society. Here, the NLP approach in the ASTI study would not be feasible because the textual approach ignores the historical, social, economic and political conditions that give birth to cases. In fact, the NLP approach would probably not be conducive in producing judicial progress and reform in cases of this nature.

Exclusion of stop words and Punctuation Marks

I take note of the fact that the authors excluded stop words that apparently do not add to the significance of the text, citing the “NLTK Library” and their own Tagalog stop words. The question is, was this “bag” of exclusionary words vetted and actually found inapplicable in the Philippine criminal legal context? If so, who deemed them inapplicable or useless?

Law has a branch of study called statutory construction, which contain rules that supposedly govern the reading, interpreting and deciding the application of laws to cases. Lawyers are taught that some words have specific legal connotations that may be different from their ordinary signification. The use of conjunctive words like “and”, and disjunctive words like “or” can change the meaning or interpretation of a legal provision. But these conjunctive and disjunctive words were excluded from the bag of relevant words because they do not apparently add to the meaning of the text.

The study also stated that it “remove(d) punctuation marks, numbers, and any other characters that were not letters” (p 133). But punctuation marks like “,” and “ ; ” can also change the import of legal provisions.

It is not clear from the study if such differences were actually considered in deciding what words were inputted into the “bag-of-words” fed to the NLP or ML framework adopted by the authors. This is a grave concern that goes into the accuracy and reliability of the framework they had adopted.