Thursday, December 22, 2022

The Impact of Artificial Intelligence on Engineering and Art

The use of Artificial Intelligence (AI) software and the ethical, legal and artistic implications of its use are a growing source of debate and conflict in many fields. Current capabilities are already testing established concepts in copyright protections, intellectual property rights and the very definition of what it means to create art. News stories over the past year are a good indication of the current state of AI, the conflicts between the rights of past creators versus the present / future, resulting economic impacts of those changing claims of rights and the future of art itself.

Copyright for Past Works Leveraged by AI

Teams building modern, complex software applications are commonly distributed across nearly every time zone on the planet, allowing nearly continuous 24x7 work. A core necessity of this distributed development is use of a centralized "vault" (or "repository") to which each developer can push code they've completed for use by others or from which they can pull completed modules built by others which they need to use. GitHub is one of the largest online source code repositories used by developers in every industry and can be used for open source projects and private projects where source code is supposed to be protected.

Github was purchased by Microsoft in 2018 in a move which immediately raised suspicions as to Microsoft's intent. At the time, Microsoft pitched the acquisition as its sign that it was fully supporting the open source "movement" while also re-engineering its internal processes to leverage work flows proven to be successful with open source platforms. By 2022, the larger developer community's suspicious seemed to have been justified when Microsoft announced a tool called CoPilot, aimed at speeding up "boilerplate" grunt work required in almost any software project.

CoPilot is a process running within a developer's favorite editor that constantly compares what is being typed with keywords and syntax seen from scanning / categorizing the MILLIONS of lines of source code within GitHub. When it finds a match, it can auto-insert a complete programming statement or block of code that is likely an exact match for what the developer was going to type. Even if not an exact match for what the developer intended, the suggested code may often only require changing a few variable names, etc. which takes far less time than typing the entire statement or block by hand. In theory, this tool improves quality as well by recycling "best practice" code from existing software that likely already works. All good, right?

For open source developers owning copyrights on those millions of lines of existing code in GitHub, having their code scanned and normalized into generic boilerplate for re-use without accreditation or compensation is NOT what they agreed to when putting their code in GitHub and/or publishing the code via open source licenses. The fact that Microsoft trained the CoPilot AI on the content of GitHub BEFORE informing current users the training was underway denied those copyright owners the ability to remove their code from scanning. Once scanned, it is likely impossible to definitively prove a suggestion by CoPilot derived from a particular block of previously scanned copyrighted code.

Copyright for AI-Created Works

The creation of CoPilot by Microsoft raised questions about the rights of PRIOR creators whose works become training fodder for AI algorithms to use in creating new works. The next obvious question involves whether AI created works themselves can gain copyright protection. There's an answer to that question, but it isn't clear the answer is obvious or correct.

https://www.cbr.com/ai-comic-deemed-ineligible-copyright-protection/

In September of 2022, Kris Kashtanova received notification from the US Copyright Office that copyright protection had been granted to a comic book she had created using a "text-to-image" AI engine called Midjourney. On December 20, 2022, she received a second notification from the USCO stating they had made a mistake and that the work was NOT eligible for copyright protection having been created by AI tools. The original copyright application mentioned the use of Midjourney but did not attempt to estimate how much of the overall labor of creation had been performed by the AI versus a human.

In general, it seems USCO policy is to NOT grant copyright to AI works and the initial grant in this case was a mixup. In order to gauge the fairness of the USCO's current policy, it is helpful to review a few examples and summarize how these tools function and either speed up existing manual "grunt work" or yield entirely new permutations not likely to have been created by humans. This type of "creative" AI logic is significantly different in its goals than the types of AI used to provide "expert systems" that for example leverage the past decisions of thousands of doctors to act as a quality check on the next diagnosis but much of the capability stems from similar pattern recognition and training algorithms.

The current state of the art isn't limited to creating static images for pages in a comic book. A now-common use of AI involves the creation of "music videos." I like rock music and see many examples like these in my YouTube recommendations:

https://www.youtube.com/watch?v=o5xtrK0iF6M - Black Hole Sun - Soundgarden
https://www.youtube.com/watch?v=kGLo8tl5sxs -- Echoes - Pink Floyd
https://www.youtube.com/watch?v=IPtntfM1tJM -- White Rabbit -- Jefferson Airplane

I have not personally tried using these tools to create a video to learn EXACTLY what is involved but have seen descriptions of the current process. In general, the "creator" provides a set of inputs to the engine comprising the following types of information:

  • desired length of the video
  • text containing lyrics or words related to themes or imagery desired
  • key milestones identified by timestamps (offsets from time 00:00) where transitions are desired
  • for each milestone, hints about specific concepts desired for the imagery at that point
  • for each milestone, hints about the type of imagery desired (photo-realistic? cartoonish? abstract / impressionistic? surrealist?)
  • for each milestone, hints about action desired within a milestone scene (static? with motion?)
  • for each milestone, hints about how the imagery should transition from the prior milestone (jump? fade out and in? dissolve? etc.)

These configuration settings are defined in a file or collection of files as essentially script inputs (key point…) and provided to the algorithm which has already been "trained" by scanning millions of images on the Internet that have already been labeled with descriptive info that can be matched up to the hints in the input. Each milestone is treated as a "clip" which the software will create by rendering images using the nouns and hints parsed from the script input. After each "scene" is created, the engine goes back and uses the other hints about transitions between scenes to render the final pass of the entire clip.

Because the process is script oriented, whatever part of the process is still manual with current tools can always be further automated with more scripting. For example, additional logic will likely be created to further eliminate human labor by performing these tasks:

  • scanning a music audio file for audio patterns indicating tempo / beats or detect rock guitars versus symphonic strings to suggest mood
  • filtering for frequencies in the human voice range to extract that portion of the audio for transcription to provide further hints about the subject matter
  • mapping timestamps within the audio for key transition points (volume transitions, the start of voices for verse and chorus sections, etc.) and feeding those to the final transition planning beteen "scenes" rather than having a human manually listen to the work and define them

Those with a background in software engineering or video / audio production can see descriptions of this process and instantly recognize how much tedium is eliminated from just the ASSEMBLY of dozens / hundreds of short "scenes" into a final product, much less the actual synthesis of each of those short scenes. The key question raised in all of this is perhaps this: If software is used to create "art" and artificial intelligence algorithms are used for both elimination of grunt work and "synthesis" work, where is the boundary between automation of those two and where is the boundary between "machine product" and "art?"

The challenges of making those distinctions are already being discussed by top notch players in the field. Here's a link to a video of an interview of Smashing Pumpkins guitarist / songwriter Billy Corgan by Rick Beato in which the impact of AI on the music industry is discussed:

https://www.youtube.com/watch?v=nAfkxHcqWKI

The key point about the impact of AI on artists and producers begins at 1:25:00 in the clip but the entire 90 minute interview is worth watching. In that specific segment on AI, Corgan recounts a trip he made as a child where he visited the Salvador Dali museum in St. Petersburg, Florida. As a child, he was familiar with Dali and how his public persona and flamboyancy had influenced the art world but when he actually saw the body of work in person, he was surprised that Dali could actually PAINT. Really paint. He wasn't just a showman and successful at the business of art, he mastered the craft.

The Beato interview above is one of the better discussions about the impact of artificial intelligence technologies (others abound online) but the vast majority of these discussions address AI in the context of art being primarily a PRODUCTION problem in creating a consumable PRODUCT, like making Hershey bars or a pair of shoes. Orienting conversations about AI from that perspective leaves a vastly more important aspect of its impact unaddressed.

Misunderstanding Art - Product or Process?

In my senior year of high school, I was fortunate enough to have been forced to read an influential essay (in literature circles, anyway) published in 1925 by Spanish philosopher José Ortega y Gasset. Versions of the book containing the essay are still available for purchase and a PDF of the essay itself is available at the link below for those interested.

https://monoskop.org/images/5/53/Ortega_y_Gasset_Jose_1925_1972_The_Dehumanization_of_Art.pdf

This essay was the first reading assignment in an English class dedicated to analyzing four key genres of literature --- short stories, novels, plays and poetry. The essay acted as the thesis for the entire year course by encouraging students to abstract away from all of the SPECIFIC literary works covered in the class to think in more general terms about what ALL forms of literature (and more generally, all forms of art) are attempting to achieve.

At the time of its writing, the art world was witnessing a backlash against the then-new genre of highly abstract painting which didn't attempt to use techniques of shade and perspective perfected over the prior centuries to convey ideas but instead focused on vastly simplified shapes and patterns. Even today, one hundred years later, many people despise these forms of expression so it's easy to imagine now how much revulsion they generated among the public when they were new. The essay, entitled The Dehumanization of Art attempted to explain why the new emerging styles were literally removing traditional human shapes and conventional forms of representation of space and perspective as a means of shedding new perspective on the present which, after a first heavily industrialized world war, drastically altered people's perceptions about the world. The essay outlined a way of thinking about art as a PROCESS, rather than a PRODUCT or outcome. Specifically, artistic creation is a perpetual cycle of experience and creation intended to expand the understanding of the world for both the creator and the consumer.

For creators, the process of creating art requires mastering a set of skills in a medium (painting, sculpture, music, words, etc.) established over decades / centuries. This effort in mastering "the basics" not only makes the creator more productive but helps convince potential audiences that the artist is more than a layman or hack and HAS something worthy of consideration. After honing those skills, the creator then applies them to the surrounding environment to yield a unique perspective which is reflected in their creation. The degree of novelty of that perspective is a key element of the "worthiness" of the art created because it is the novelty of the creation still wrapped in the familiarity of the form that attracts the attention of consumers.

However, the metaphysical "worthiness" of that output can be impaired if a unique idea or feeling is still "encoded" into the form of the medium in a routine, cookie-cutter, reductive fashion. Imagine rock and roll's evolution STOPPING at the point where the first guitar player mastered the first I - IV - V shuffle riff and ALL subsequent songs were written with that rhythm and chord progression. It doesn't matter what genius is contained in the lyrics or melody, no one would see value in The Sounds of Silence or Let It Be buried in a two minute, thirty second rock shuffle after five thousand prior songs had been recorded with the exact same chord progression and rhythm.

As a result, successful "art" cannot just reductively combine new insight with established artistic patterns and yield new worthwhile art. The form of the art itself must be tweaked nearly continuously to yield the required amount of overall novelty to attract an audience and serve its function by sharing an insight between the artist and the audience. Tweaking of the form is required not only between artists working within a form, but even for an artist between works. Part of the challenge to the creator is deciding how much variation from established norms should be included. That degree of variation is not only a function of the artist's inventiveness but how far they are willing to stretch the boundaries of the existing form. If they only stray a small bit, the work risks being perceived as too derivative and being ignored -- FAILURE. If they stray so far from norms that the audience cannot recognize the work within the established medium and cannot adjust to it without EXTENSIVE exposure, the work will be ignored as too bizarre -- FAILURE.

In contrast, if the artist's mastery of current norms is so perfected they can include new alterations which pay homage to or mock those norms while still embodying them, audiences are more likely to recognize the creativity, absorb the work and gain exposure to the idea the artist was attempting to communicate -- SUCCESS.

All of the above is a very abstract, intellectual means of describing something we all intuitively understand about any of the art we enjoy. No one wants to turn on their favorite radio station or open their playlist and find every song sounding 95% like their favorite band. The same is true for literary or visual arts. Readers traditionally expect a novel to have a beginning, a story arc over a primarily linear timeline and an end. Reading a story like Slaughterhouse-Five by Kurt Vonnegut that has no temporal sequence at all is an odd, even disorienting experience because it strays so far from conventions. However, that altered form was uniquely suited to the unique idea Vonnegut wanted to convey, an idea that only clicks at the random point he chose to use in the novel to convey it.

Our self-defense wiring is optimized to search for novelty as a reflection of danger or opportunity but we are also wired with incredible pattern matching capabilities that act as mental short cuts / time-savers to quickly associate common inputs with required responses. In essence, art is a means of entertaining ourselves by "hacking" these contradictory processes as they operate simultaneously to attract attention through novelty yet quickly convey ideas by leveraging familiarity with prior trained inputs.

But how does all of the above tie to concerns about artificial intelligence and art? Or artificial intelligence and engineering?

Much if not all of the discussions about the use of artificial intelligence in art focus on its impacts on the final PRODUCT of an artist and that product's consumption and acceptance by an audience. Left unaddressed is the impact of AI on the PROCESS of creating art and that process' impact on the artist and eventually the audience. Writing a symphony, recording an album, writing a screenplay or painting a mural can take enormous amounts of time doing things "the old way" without AI. Matching audio to video scenes... Selecting final takes from 64 tracks and setting levels, effects and balance to merge into a stereo mix... Considering plot flow while also factoring in stage blocking or scenery in a theatre or movie set...

Training AI algorithms by scanning thousands of hours of prior albums or films to identify production patterns to recycle CAN save an artist or engineer enormous amounts of time. But doing so can actually rob time from the creator as well -- time they were spending thinking about the idea being conveyed or problem being solved while they were doing the grunt work. Time that might have resulted in a refinement or perfection of an idea that can transform the larger work. It is perhaps THAT impact of AI on creativity and art / engineering that is the hardest to quantify but will prove most impactful -- to our individual and collective detriment -- in the long run.


WTH