Big Data, Law & Regulating Big Data
By Dr. Vinod Surana
The term “Big Data” has been labelled amongst other things as a buzzword or a new-age “marketing term”[i], but a reading of the World Economic Forum’s write up on Big Data clarifies that Big Data is not a “new or isolated phenomenon”, but another phase (albeit a rather crucial phase that will change the way businesses and society functions) in a long evolution of how data is captured and used[ii].
But first, what is Big Data?
Simply put, Big Data refers to very large and often complex sets of data that are so large in size that conventional software tools long used to analyse such data are no longer capable of handling such Big Data[iii]. Most literature[iv] on Big Data, distinguishes Big Data from other data and more specifically previous data analytics movements by four characteristics: Volume, Velocity, Variety & Veracity.
This is the most apparent and best-known factor of all and refers to the large amounts of data that are constantly being produced by various entities (individuals, corporations and governments, etc). Though already mentioned, it is imperative to reiterate that the present generation / production of data is so complex and large that conventional data analytical tools are no longer efficient and sometime entirely obsolete.
Velocity refers to the speed with which data is created, generated or produced. High velocity data is generated with such speed that it requires unique analytical and processing tools. An example of data that is generated with high velocity would be social media posts on popular platforms like Twitter, Facebook and Instagram[v]. The speed with which data is generated can be even more significant for certain businesses than the volume of data generated. Real-time or nearly real-time information can provide businesses, analysts and managers an “obvious competitive advantage”[vi].
Big Data is generated from a variety of sources and generally is one of three types: structures, semi-structured or unstructured data[vii]. Depending on the individual, industry or organization, big data generated comprises information from a multitude of sources such as transactions, social media, enterprise content, sensors and mobile devices[viii] and can be in various forms including photos, sound recordings, written text, etc.
This feature of Big Data is often the most debated factor of Big Data. Veracity refers to the quality, authenticity and reliability of the data generated and the source of data. The usability and relevance of Big Data depends heavily on the data being of a high veracity.
Why does Big Data matter?
The topic of Big Data was a prominent feature at the 2012 World Economic Forum held at Davos, Switzerland. To coincide with the 2012 meeting, the WEF released a report titled “Big Data, Big Impact”, in which it proclaimed data as a new class of economic asset, like currency or gold[ix], it has even been described as “the oil of the digital era”[x]. The significance of Big Data is rooted in the principle that the more information that is available on any situation or thing, the more reliably one will be able to analyse and gain valuable insights that can positively impact individuals, corporations and other organisations, by enabling smarter and improved decision making and performance[xi].
With more and more business and social activities now being digitized, data hasbecome critical to rapid, informed decision making across businesses and organizations. However, data on its own has no intrinsic value[xii]. It assumes significance, when innovative and nuanced technology and resources are developed and available to analyse these massive amounts of data and turn them into valuable insights. This is where Artificial Intelligence (and Machine Learning) play a role in the optimum utilization of Big Data.It is important to note that this is a reciprocal relationship between the Big Data and Artificial Intelligence (AI). The both depend on each for their success. Big data needs smart tools like AI to unlock their ultimate potential, that is to transform data into reliable insights for organisations to utilise. Similarly, for AI to be recognised and driven forward it needs vast amounts of data (basically, Big Data)[xiii].
Significance of Big Data for the Legal Industry / Profession
There seems to be almost universal acceptance of Big Data’s potential to transform businesses and across nearly every industry, the positive impact of Big Data in improving decision making business operations can be witnessed. It has never been easier for law firms to improve their business / operations, satisfy and even exceed their client expectations and increase profitability[xiv]. However, the widespread impact of Big Data is yet to have substantially permeated the legal industry / profession. In 2019, a data analytics company reported that while most law firms (nearly 90% of the sample) were of the opinion that Big Data and its role in decision making is important, only 16% were actually harnessing Big Data to improve their organisations[xv]. Another survey by a top business analytics company of 1000 senior executives across the health care, insurance, legal, science, banking industries as well as government in the United States found that, the legal industry finished second last among industries in utilizing Big Data in some form.
Then why, when there are few industries that generate as much data as law and the clients and businesses / clients they serve are driven by data, is there such widespread, almost systemic reluctance in harnessing the transformative powers of Big Data? There are several explanations for this, namely, legal culture, the myth of lawyer exceptionalism, the traditional model of rewarding input / labour intensity versus output / results, limited investment in new resources (both human and machine), and resistance to bringing together the practice of law with the business of delivering legal services[xvi].
In spite of this systemic reluctance to data, clients (notably corporate clients) are increasingly pushing for and even demanding the legal professionals provide their counsel and legal strategies on the basis of data and not just relying on their acumen or instinct.
There a number of ways in which the legal industry can successfully harness Big Data to their advantage. To start with legal service providers will need to rethink their financial investment policy for technology, cultural changes, and most importantly a willingness to reconsider conventional beliefs regarding the industry[xvii]. Big Data can benefit the legal industry in internal organizational matters and external client facing aspects. Internally, Big Data can help law firms with time management and billing, by providing a more nuanced understanding of the various revenue streams, which cases / assignments are most profitable and which teams / individuals are more suited to specific types of cases / assignments, thereby better leveraging the available human resources[xviii].
Big Data alone will not be enough, it will be a combination of Big Data and various technological innovations including legal analytical toolsthat will make the most impact (such as AI or blockchain)[xix]. Legal data analytics can be classified as Predictive analytics and Descriptive analytics. While Predictive analytics uses legal data analytical tools to anticipate what is likely to happen and therefore make a more educated decision, Descriptive analytics uses existing data and inputs to reach a conclusion[xx]. Predictive analytics can be used to improve the process of document reviewing (disclosure / discovery in litigious matters, due diligence for commercial transactions and other legal work) and other labour-intensive tasks[xxi]. This could result in higher client satisfaction and (due to faster turnarounds) and increased profits (by better leveraging human resources). An example of this is when an organization ordinarily might require a team of lawyers to physically search through documents for something specific which could be very time consuming, instead they could appropriately utilize Big Data and legal analytical tools to sort and extract text or other patterns in large amounts of text and only produce relevant content for human analysis[xxii]. Such a Predictive analysis could also be used while formulating litigation strategies, wherein by using advanced analytical tools, legal professionals will be able to “synthesis outcomes and prospects” and thereby make more informed decisions and litigation strategies[xxiii]. Big Data in combination with these analytical tools can also be used for varied levels of regulatory compliances and project / contract management, by allowing legal professionals to effectively monitor all pertinent information and intimate the human resources managing such tools of any breaches or potential breaches. Such an algorithm would essentially take the human error factor out of the equation, reducing the occurrence of any such breaches or violations. Probably one of the least emphasised areas in which Big Data can provide valuable insights to the legal industry is with respect to costs and pricing. By utilizing the Big Data already being generated by law practices with respect to pricing and billable hours information for a range of legal services, law practices would have better insight into predictions regarding specific legal services and different pricing models including alternative fee arrangements, and would be able to alter and customize pricing models based on the task at hand and the nature of their client. This again provides increased client satisfaction (a result of a more appropriate pricing arrangement) and in turn increases the profits of a law practice[xxiv]. It is clear then that the purpose of incorporating Big Data and analytical tools is not to attempt to replace the (human) legal professional, but rather to elevate such a legal professional’s experience and instinct with valuable information obtained from other sources, thereby enhancing decision making, both for internal management and external client facing aspects.
Regulating Big Data
With the exponential growth and widespread acceptance of the benefits of Big Data, there has also been tremendous amounts of attention on the various challenges and risks associated with collecting and utilizing Big Data. If these risks are not appropriately addressed, the risks could outweigh the benefits of Big Data and also grossly affect public support for more widespread utilization of Big Data across industries[xxv]. However, in spite of this understanding, countries have been slow to regulate (Big) Data. This article while acknowledging the multiple concerns surrounding Big Data such as quality of data sources, antitrust issues and consumer protection, amongst others will only deal with the most significant challenge of them all- the issue of privacy.
Most major jurisdictions in the world at present do not have laws that specifically regulate Big Data.
In the United States, there is no overarching or single data protection or privacy law[xxvi]. Instead, any company or organization looking to engage in Big Data activities must comply with a range of different regulations-including sector specific privacy laws that govern the data involved in their operations, contractual requirements, and other industry and / or region-specific regulations that are applicable to these businesses. This is termed as patchwork laws and can be seen many other jurisdictions as well[xxvii]. In the United States, at the federal level, the U.S. Federal Trade Commission (FTC) has the power to enforce data protection regulations, however due to its federalist structure, the actual enforceability is in doubt and most regulations are mostly at the state level leading to even more confusion as many a time various state regulations are incompatible with each other, hindering the way in which companies across state lines operate[xxviii]. Organisations engaging in Big Data activities also need to ensure that they are compliant with any industry or region specific regulations such as the Health Insurance Portability and Accountability Act (HIPAA) which governs the use of “Protected Health Information”, the Children’s Online Privacy Protection Act, the Computer Fraud and Abuse Act and region specific regulations include the Massachusetts Data Security regulations and the California Online Privacy Protection Act[xxix]. However, there have been various committees and recommendations pushing for a comprehensive all-encompassing national data privacy legislation[xxx].
In 2018, the European Union (EU) replaced the earlier EU Data Protection Directive with General Data Protection Regulation (GDPR). The GDPR lays down a baseline set of standards that companies dealing with data of EU citizens will have to comply with including requiring the consent of data subjects for processing of their information, anonymizing the data collected to protect privacy of data subjects, providing data breach notifications, provisions for safe transfer of data across borders, mandated requirements for companies to appoint a data protection officer to oversee GDPR compliance[xxxi].The GDPR is acknowledged as the most comprehensive data protection law currently in the world, due to a number of factors, the most significant of which is that the GDPR is not just applicable to organisation operating in the EU but rather takes a sweeping approach in that it protects all data of EU citizens even when such information is processed by a foreign organization i.e. the criteria isn’t limited to those organisations operating in Europe but includes all organisations that handle and process data of EU citizens (including those based and operating overseas).
India at present does not have a comprehensive, dedicated and specific data protection legislation, neither does it have any regulations pertaining specifically to Big Data. However, the Information Technology Act, 2000 was amended to include the Information Technology (Reasonable Security Practices and Procedures and Sensitive Personal Data or Information) Rules, 2011 for the protection of personal data. In 2012, the Government of India introduced the Personal Data Protection Bill 2019 (PDP Bill). A Joint Parliamentary Committee is currently considering the PDP Bill and a revised draft of the PDP Bill is expected to be issued sometime in 2020. The PDP Bill would then have to be passed by both houses of Parliament and notified in the official gazette before it becomes a law. Even after enactment, the law is likely to be implemented in a phased manner and currently, there is no information about the implementation timeline. Once enacted, this will be India’s first law on the protection of personal data and will repeal the pertaining amended sections and the 2011 Rules of the Information Technology Act. While the PDP Bill does not specifically contain any provisions relating to Big Data, similar to the GDPR, its implementation in India will have far reaching impact, which is expected to also regulate Big Data activities. In conclusion, while it is widely acknowledged that there are certain challenges to handling Big Data, the use of Big Data will undoubtedly become an important distinguishing factor in the growth and competitive advantage of organisations, by enhancing decision making, productivity and providing more than appreciable value for these organisations. Countries will step up to further enable this while ensuring its citizens are protected by introducing comprehensive regulations[xxxii]. What will be the key though is to see how inclusive, business friendly and data forward these regulations will be?
[i] Steve Lohr, ‘The Age of Big Data’ The New York Times (New York, 11 February 2012) https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
[ii] Bernard Marr, ‘A brief history of big data everyone should read’ World Economic Forum Agenda (25 February 2015) https://www.weforum.org/agenda/2015/02/a-brief-history-of-big-data-everyone-should-read/
[iii] ‘Big Data Explained’ https://www.mongodb.com/big-data-explained
[iv] ‘The Four Vs of Big Data’ IBM Big Data & Analytics Hub (Infographics & Animations) https://www.ibmbigdatahub.com/infographic/four-vs-big-data ; ‘Big Data-A Complex & Evolving Regulatory Framework’ European Commission- Digital Transformation Monitor (January 2017) https://ec.europa.eu/growth/tools-databases/dem/monitor/sites/default/files/DTM_Big%20Data%20v1_0.pdf ; Andrew McAfee and Erik Brynjolfsson, ‘Big Data: The Management Revolution’ Harvard Business Review (October 2012) https://hbr.org/2012/10/big-data-the-management-revolution
[v] ‘The Four Vs of Big Data’ Enterprise Big Data Frameworkhttps://www.bigdataframework.org/four-vs-of-big-data/
[vi] Andrew McAfee and Erik Brynjolfsson, ‘Big Data: The Management Revolution’ Harvard Business Review (October 2012) https://hbr.org/2012/10/big-data-the-management-revolution
[vii] ‘The Four Vs of Big Data’ Enterprise Big Data Frameworkhttps://www.bigdataframework.org/four-vs-of-big-data/
[viii] ‘The Four Vs of Big Data’ IBM Big Data & Analytics Hub (Infographics & Animations) https://www.ibmbigdatahub.com/infographic/four-vs-big-data
[ix] ‘Big Data Big Impact: New Possibilities for International Development’ World Economic Forum 2012 http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf ; Andrew McAfee and Erik Brynjolfsson, ‘Big Data: The Management Revolution’ Harvard Business Review (October 2012) https://hbr.org/2012/10/big-data-the-management-revolution
[x] ‘Regulating the Internet Giants: The world’s most valuable resource is no longer oil but data’, The Economist (6 May 2017) https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data; ‘Data is the New Oil of the Digital Economy’, The Wired (2014) https://www.wired.com/insights/2014/07/data-new-oil-digital-economy/#:~:text=Data%20in%20the%2021st%20Century,is%20more%20valuable%20than%20ever
[xi] ‘What is Big Data’, Bernard Marr & Co. https://www.bernardmarr.com/default.asp?contentID=766 ; Andrew McAfee and Erik Brynjolfsson, ‘Big Data: The Management Revolution’ Harvard Business Review (October 2012) https://hbr.org/2012/10/big-data-the-management-revolution
[xii] ‘Why is Law So Slow to Use Data?’, Mark A. Cohen Forbes (24 June 2019) https://www.forbes.com/sites/markcohen1/2019/06/24/why-is-law-so-slow-to-use-data/#20df23f0b8eb
[xiii] ‘How Big Data & AI work together’, Kevin Casey The Enterprises Projecthttps://enterprisersproject.com/article/2019/10/how-big-data-and-ai-work-together
[xiv]‘Why is Law So Slow to Use Data?’, Mark A. Cohen Forbes (24 June 2019) https://www.forbes.com/sites/markcohen1/2019/06/24/why-is-law-so-slow-to-use-data/#20df23f0b8eb
[xv] ‘Big Data Transforming the Legal Industry’ Turrito Analytics (17 July 2019) http://www.turritoanalytics.com/2019/07/17/big-data-transforming-the-legal-industry/
[xvi]‘Why is Law So Slow to Use Data?’, Mark A. Cohen Forbes (24 June 2019) https://www.forbes.com/sites/markcohen1/2019/06/24/why-is-law-so-slow-to-use-data/#20df23f0b8eb
[xvii]‘Why is Law So Slow to Use Data?’, Mark A. Cohen Forbes (24 June 2019) https://www.forbes.com/sites/markcohen1/2019/06/24/why-is-law-so-slow-to-use-data/#20df23f0b8eb
[xviii] ‘Big Data Transforming the Legal Industry’ Turrito Analytics (17 July 2019) http://www.turritoanalytics.com/2019/07/17/big-data-transforming-the-legal-industry/
[xix] ‘When Big Data Meets Big Law’ Thomson Reuters-Legal Insights Europe (20 September 2019) https://blogs.thomsonreuters.com/legal-uk/2019/09/20/when-big-data-meets-big-law/
[xx]‘Impact of Big Data in the Legal Industry’, PriyaDialaniAnalytics Insight (20 December 2018) https://www.analyticsinsight.net/impact-of-big-data-in-the-legal-industry/
[xxi] ‘Opportunities for Big Data + Big Law’ Holli SargeantMedium Data Series (28 February 2019) https://medium.com/dataseries/opportunities-for-big-data-big-law-cd1318414497
[xxii] ‘Why We’re Training the Next Generation of Lawyers in Big Data’, Anne Tucker and Charlotte Alexander Government Technology (2 October 2018) https://www.govtech.com/workforce/Why-Were-Training-the-Next-Generation-of-Lawyers-in-Big-Data.html ; ‘When Big Data Meets Big Law’ Thomson Reuters-Legal Insights Europe (20 September 2019) https://blogs.thomsonreuters.com/legal-uk/2019/09/20/when-big-data-meets-big-law/
[xxiii] ‘Opportunities for Big Data + Big Law’ Holli SargeantMedium Data Series (28 February 2019) https://medium.com/dataseries/opportunities-for-big-data-big-law-cd1318414497
[xxiv] ‘Opportunities for Big Data + Big Law’ Holli SargeantMedium Data Series (28 February 2019) https://medium.com/dataseries/opportunities-for-big-data-big-law-cd1318414497
[xxv]Bruce Schneier, Data and Goliath a Portrait of Big Data Abuses, W.W. Norton & Company (March 2015)
[xxvi] ‘Ten Questions for Future Regulation of Big Data: A Comparative and Empirical Legal Study’, Bart van der Sloot and Sascha van Schendel Journal of Intellectual Property, Information Technology and E-Commerce Law (2016) https://www.jipitec.eu/issues/jipitec-7-2-2016/4438
[xxvii] ‘Reforming the U.S. Approach to Data Protection and Privacy’ Digital and Cyberspace Policy Program, Council on Foreign Relations(30 January 2018) https://www.cfr.org/report/reforming-us-approach-data-protection; ‘How Should the U.S. Legislate Data Privacy’, Jack KarstenBrookings Institution (30 July 2018) https://www.brookings.edu/blog/techtank/2018/07/30/how-should-the-us-legislate-data-privacy/
[xxviii] ‘U.S. Privacy Laws: State Level Approaches to Privacy Protection’ Ryan Brooks (3 July 2020) https://blog.netwrix.com/2019/08/27/data-privacy-laws-by-state-the-u-s-approach-to-privacy-protection/
[xxix] ‘Regulation of Big Data in the United States’, Jacqueline KlosekTaylor Wessing Global Data Hub (July 2014)https://globaldatahub.taylorwessing.com/article/regulation-of-big-data-in-the-united-states ; ‘Determining the Regulations Big Data Should Face’ The Alacer Group Insights https://www.alacergroup.com/big-data-regulations/
[xxx] ‘Regulation of Big Data in the United States’, Jacqueline KlosekTaylor Wessing Global Data Hub (July 2014)https://globaldatahub.taylorwessing.com/article/regulation-of-big-data-in-the-united-states
[xxxi]‘Big Data and security policies: Towards a framework for regulating the phases of analytics and use of Big Data’, Broeders et al. (2017) Computer Law & Security Review, Vol. 33 (3): 309-323https://www.researchgate.net/publication/315955343_Big_Data_and_security_policies_Towards_a_framework_for_regulating_the_phases_of_analytics_and_use_of_Big_Data ; ‘What is the General Data Protection Regulation? Understanding and Complying with GDPR Requirements in 2019’ Juliana De Groot, Data Insider, Digital Guardian (5 August 2020) https://digitalguardian.com/blog/what-gdpr-general-data-protection-regulation-understanding-and-complying-gdpr-data-protection
[xxxii] ‘Big Data’s Potential for Businesses’ Michael Chui, James Manyika and Jacques BughinFinancial Times&Mckinsey Global Institute (13 May 2011) https://www.mckinsey.com/mgi/overview/in-the-news/big-data-potential-for-businesses