ChatGPT, a sibling model of InstructGPT, has gained much traction recently. With over 1 million users and their interactions with this interesting chat model, the user reactions have amazed discourses on education, public policy, lawyering and many other things. However, the model's success so far is not a pessimistic development when it comes to technology governance. Let's take it this way: there are such technological models, which are evolutionary from a technical aspect. Achieving the edge obviously may be backed by technology hype cycles.
However, there is no doubt that ChatGPT is a unique innovation, considering their reinforcement learning algorithms. While people may claim that this piece of technology is capable to replace Google, the generic use cases of this technology depends. In fact, it wasn't long ago when Meta had produced an algorithmic system, Cicero, to predict discourses in diplomacy, which they claim to be in the realm of AI Diplomacy. It is known that Cicero has achieved extraordinary scores in the strategy game at webDiplomacy.net. That is praiseworthy, and if the system is analysed at best, a lot could be understood about its generic and non-generic use cases.
This article is dedicated to analyse the use of reinforcement learning via ChatGPT, based on its use cases / distribution relevance, from a regulatory and policy perspective. The article also provides conclusive insights on where the trajectory of reinforcement learning, could invite regulatory oversight, with a sectoral focus.
Reinforcement Learning at a Glance
Reinforcement learning (RL) is a subset of machine learning practices which are followed by developers and data scientists across the globe. It has a special place in the field of artificial intelligence and law, due to its ubiquitous features. RL is a kind of machine learning method, where any AI system (subject to learning the environments of the relevant data subjects for a set of tasks to be achieved expectedly) are reinforced or exhorted in a pattern, in a specific environment to maximise their notion of cumulative rewards. This reminds us of behavioural economics, wherein earning cumulative rewards is essential for an agent to act in a pattern they are ought to be. Now, using reinforcement learning, it is much possible to create proper use cases for the agent to learn an optimal / nearly-optimal, policy that incentivises the "reward function" or other user-provided reinforcement signal that accumulates from the immediate rewards. The products / services discussed in this article, ChatGPT, Cicero and GitHub Copilot are inspired by or based on reinforcement learning.
Looking at policy realities as they stand, RL can be subject to heavy supervision protocols, which may succeed or fail in building the AI system, accordingly. For example, in the case of State-Action-Reward-State-Action (SARSA) model, RL algorithms are subject to cumulative rewarding. Here, cumulative rewarding is experienced by the algorithm when it has to act in line with a policy statement, which represents a probable set of things to be achieved. This is natural to happen in any SARSA setup. However, In the case of Deep Q-networks (which also evolves at the level of neural networks), the RL algorithm has to learn and self-explore to develop those self-reflected "policy considerations" along with the existent RL techniques in place.
In the next section, certain products or services developed through reinforcement learning are analysed, to estimate their impact, purpose and limitations, from a regulatory standpoint.
Regulating Reinforcement Learning
Regulations cannot curb innovation. However, we are at a tipping point in various D9 countries, where at the vicinity of the proliferation of recognisable and usable disruptive technologies, if governments deny or overlook the issues attached with these technologies, then it is concerning. We also understand that multiple classes of artificial intelligence technologies of socio-economic and socio-technical value, have to be observed carefully, with a sector-to-sector regulatory approach. Governments have already started developed generalised and some sector-specific regulatory methods, especially on recommendation algorithms, recognition services, predictive algorithms and other relevant tech products / services of relevant categories including analytics. RL is a unique case but to make it simple (unlike recommendation algorithms, where narrow regulatory outlooks might help), some entrant regulatory breakthroughs could be helpful.
Let's discuss ChatGPT to understand its scope.
ChatGPT's Omnipresent and Omnipotent Features
ChatGPT is a perfect, relevant and evolving example of omnipresent and omnipotent AI technologies. Let's understand how it works, as explained by the OpenAI Team.
The diagram is quite self-explanatory, so let us analyse a set of sub-steps in every step. As per Step 1, the reinforcement learning algorithms are put into extensive use. Then, the output behaviour, converted into data, is used to fine-tune GPT 3.5 with supervised learning. This is interesting because anyhow, you have to encapsulate the AI system's lookout amidst the data available to it. In Step 2, the reward model ranks the outputs received. Now, Step 3 is an advanced version of the first two steps, where supervised policy is used to initialise the PPO (Proximal Policy Optimisation) Model. Now, the reward models (RMs) as created in Step 2 calculate the reward as denoted in this diagram, to use PPO to update the policy for supervising the system.
Now, in ChatGPT, as per a blog written on March 3, 2022 by OpenAI, Reinforcement Learning from Human Feedback (RLHF) creates a third layer of reinforcement learning to make ChatGPT according to OpenAI, more "safe and useful". Let us look at Figure 3, where we have provided an excerpt of an example of how the same question asked to InstructGPT and ChatGPT really work. Taking this question into reference, InstructGPT would give you a blunt response, as the text indicates. However, ChatGPT does the opposite and the response seems to be explorative. This by itself, is iterative deployment. They also have a difficult task ahead, which they have described in their note on Aligning Language Models to Follow Instructions:
Right now, InstructGPT is trained to follow instructions in English; thus, it is biased towards the cultural values of English-speaking people. We are conducting research into understanding the differences and disagreements between labelers’ preferences so we can condition our models on the values of more specific populations. More generally, aligning model outputs to the values of specific humans introduces difficult choices with societal implications, and ultimately we must establish responsible, inclusive processes for making these decisions.
Achieving parity with multiple specific human groups, as per group and individual notations, would be tough task, which is why some regulatory perspective must be gradually developed. Here is a reference to a paper published by OpenAI researchers in 2019 on The Role of Cooperation in Responsible AI Development (2019, p. 13):
Features that affect the likelihood and severity of a collective action problem for responsible development can be used to decrease its likelihood and severity if they are are features that we can control. For example, fundamental distrust between companies is likely to worsen a collective action problem because companies are less likely to expect that their cooperation will be reciprocated (high trust). Building trust between AI companies can therefore decrease the severity of collective action problems.
Since ChatGPT is omnipresent and omnipotent, there is no doubt that self-sufficient efforts have already started to make the service more human-centric. Perhaps regulators across the world may differ on their oversight approaches. For example, Europe may approach ChatGPT with a human rights approach, because their scholarship in the EU and the Council of Europe refer human-centric AI as some AI system which is risk-sensitive or acts in line with the human rights obligations that apply in Europe. The United States will take a proactively open approach to it, where some State-level positions across California, New York and other tech-sensitive states may be important to look out for. India might take a responsibility and risk-centric approach from the perspective of clarity to comply with Indian regulations and laws. China would be unambiguously restrictive while ASEAN member states and Japan may be risk-centric as per their own capabilities and concerns. In short, leaving the high-handed approach that Europe and China may take, ChatGPT is safe as of now. However, where ChatGPT may become critical to be studied is the phenomenon of artificial intelligence hype.
AI Hype, is defined as per VLiGTA-TR-001, our report for the Vidhitsa Law Institute:
An Artificial Intelligence hype cycle is perpetuated to influence or generate market perception in a real-time scenario such that a class of Artificial Intelligence technology as a product / service is used in a participatory or preparatory sense to influence or generate the hype cycle.
Since ChatGPT has become popular and relevant for multiple use cases that are being proposed, the risk of artificial intelligence hype is not going away. For now, considering the efforts such as RLHF put into use by OpenAI, it may be inferred that the general principled use of ChatGPT, from a commercial-technical perspective is sound. The bi-products developed by the help of ChatGPT, would be interesting to notice, since a lot multi-sector products and services would necessitate a different approach of self-regulation that OpenAI may have to opt for in future. DoNotPay is a recent example to ponder upon. Here is a tweet from Joshua Browder from DoNotPay:
Let us take an example provided by Joshua Browder from DoNotPay about ChatGPT's use to develop legal statements:
Joshua claims that ChatGPT may "replace lawyers". However, at least with limited accuracy, ChatGPT may ease a lot of legal statement drafting work, for normal issues for a lot of people. So, we had also tried out a few samples with ChatGPT to draft petitions.
Question asked to ChatGPT Draft an Arbitration Application to the Delhi High Court under Section 11(6) of the Arbitration and Conciliation Act, 1996 making clear that the position of maintainability of the petition is explained. Also interpret why the application has been filed by interpreting Section 11(2), Section 11(5), Section 11(6) & Section 15(2) of the Arbitration and Conciliation Act, 1996 in the application. (Click to scroll down the response) .
To the Hon'ble Delhi High Court,
Re: Arbitration Application under Section 11(6) of the Arbitration and Conciliation Act, 1996
I, (name of the applicant), respectfully submit this application for appointment of an arbitrator in accordance with Section 11(6) of the Arbitration and Conciliation Act, 1996.
I hereby state that:
I have entered into a contract with (name of the other party) on (date) for (nature of the contract).
A dispute has arisen between us in relation to the said contract and we have been unable to resolve the dispute amicably.
We had agreed to resolve our disputes through arbitration in accordance with the arbitration clause in the said contract.
I have already served a notice of arbitration to (name of the other party) on (date) but (he/she) has failed to appoint an arbitrator or to respond to my notice within the time specified in the arbitration clause.
I am now making this application under Section 11(6) of the Arbitration and Conciliation Act, 1996 for appointment of an arbitrator by the Hon'ble Delhi High Court.
I have also enclosed the copy of the contract, the arbitration clause, and the notice of arbitration served on (name of the other party) along with this application.
I have also referred to the concerned sub-sections (2), (5) and (6) of Section 11 read with Section 15(2) of the Arbitration and Conciliation Act, 1996 in this application.
I pray that this Hon'ble Court may be pleased to appoint an arbitrator in accordance with Section 11(6) of the Act and grant such other reliefs as may be deemed just and proper in the circumstances of the case.
I humbly submit that this application is maintainable as per the provisions of the Arbitration and Conciliation Act, 1996. Section 11(2) of the Act provides that the party who intends to initiate arbitration proceedings shall appoint an arbitrator. However, in case the other party fails to appoint an arbitrator or fails to respond to the notice of arbitration within the time specified in the arbitration clause, then the party who has served the notice of arbitration may make an application to the Court under Section 11(5) of the Act. The Court shall then appoint an arbitrator in accordance with the provisions of Section 11(6) of the Act.
Furthermore, as per Section 15(2) of the Act, the Court shall appoint the arbitrator on the basis of the rules contained in the arbitration agreement or in the absence of such rules, in accordance with the rules of the institution to which the parties have submitted the arbitration.
I seek the intervention of this Hon'ble Court for appointment of an arbitrator in the above-mentioned matter in accordance with the provisions of the Arbitration and Conciliation Act, 1996.
(Name of the applicant) (Address of the applicant)
Question asked to ChatGPT Draft an Appeal to the Income Tax Appellate Tribunal under the Income Tax Act, 1961 pursuant to an order passed by the Commissioner of Income-Tax (Appeals) under Section 270A of the Income Tax Act, 1961. (Click to scroll down the response)
To the Income Tax Appellate Tribunal,
Re: Appeal against the order passed by the Commissioner of Income-Tax (Appeals)
I, (name of the appellant), hereby file this appeal against the order dated (date) passed by the Commissioner of Income-Tax (Appeals) under Section 270A of the Income Tax Act, 1961.
I state that:
I am an assessee and have filed my income tax return for the assessment year (year) on (date).
The assessing officer has passed an order dated (date) determining my total income at (amount) and imposing a tax liability of (amount) along with interest and penalty.
I have filed an appeal against the said order before the Commissioner of Income-Tax (Appeals) on (date) and the Commissioner has passed an order dated (date) dismissing my appeal.
I am now filing this appeal before the Income Tax Appellate Tribunal against the order passed by the Commissioner of Income-Tax (Appeals).
I have enclosed the copies of the order passed by the assessing officer and the Commissioner of Income-Tax (Appeals) along with this appeal.
I have also deposited the requisite fees and deposited the amount of tax and interest as per the order of the Commissioner of Income-Tax (Appeals) in the Government Treasury.
I pray that this Hon'ble Tribunal may be pleased to admit this appeal and allow the same on merits. I also pray that this Hon'ble Tribunal may be pleased to grant such other reliefs as may be deemed just and proper in the circumstances of the case.
(Name of the appellant) (Address of the appellant)
Nevertheless, if you read the responses by ChatGPT, some people may consider that these draft petitions are infallible and useful for their "daily" use. Anyways, that is not pragmatic because context-based analysis cannot be limited or crystallised so easily. Real use cases are far more complicated to handle. Plus, ChatGPT also challenges the element of trust in-built with human environments.
Now, let us say that for any bi-product, you make some use of ChatGPT, obviously like it is in the case of Consumer Experience-based AI, you focus on technologically rendered experiences of exploration that the AI system offers. Whether it downgrades the economic necessity of human-involved trust is debatable, since the use of ChatGPT is yet to proliferate to even show any use cases, which has not been the case for now. It does not mean the system is to be discouraged. We can say that ChatGPT must be treated, as a quicker alternative to optimise exploration of knowledge, information, facts, anecdata and insights. That explorative involvement may differ with every use case, of macro and micro purposes.
There is another issue with explorative involvement. Exploring something does not imply you know in an all-comprehensive manner about a subject-matter. ChatGPT is capable to give explanations but sometimes makes errors which could be referred to as "gaslighting" facts and anecdata in a jumbled and unclear manner. In fact, contrary to popular notion, ChatGPT may force the users to understand the basics of any explored knowledge, information, fact, anecdata or insight manually. If you as a user are aware of concepts and notions related to a subject-matter at a conciliable level, ChatGPT may be insightful for you to analyse and relook at your explorative involvement to find answers. However, the element of trust could be shaky, when it comes to information flow and economy, while using ChatGPT.
ChatGPT is a lucrative RL-based service which could be helpful to embrace sociotechnical mobility in some way. Like the algorithms run by various social media companies which have become general, bloated and omnipotent in use, ChatGPT's potential bi-products may become omnipotent and omnipresent as well, accordingly. A preliminary outlook could be needed since information warfare and overload, are critical problems for the knowledge societies that thrive in the digital world. How would ChatGPT affect the interconnected vulnerabilities attached with digital public and private products, is too crucial to ignore. In addition, the impact of ChatGPT on digital public infrastructure could be very necessary to look out for, in the future.