Current Top AI Applications
Fast and efficient transportation, seamless and instant communication, and capturing high-resolution images and videos, all of our technological advancements have the goal of improving our daily lives. In the modern day, these advancements take the form of artificial intelligence, able to understand us and assist us in the most mundane tasks. This AI has seemingly changed our world overnight. Here’s what, how, and why.
ChatGPT
The first in this list is ChatGPT, often considered the face of modern-day AI. ChatGPT is a chatbot. It’s able to answer a variety of user prompts, from translating foreign languages to writing bible verses about sandwiches. It’s accessible, helpful, and fun, making it one of the most popular AI applications today.
How Does it Work?
With its impossibly futuristic uses, it’s only natural to wonder how ChatGPT works. ChatGPT runs on a GPT model, similar to a brain. GPT stands for Generative Pre-trained Transformer. This type of system is trained on millions of texts and data. It learns patterns from this data to predict the next word in its sentence. It’s like an advanced form of Google’s autocomplete feature, except it’s finishing its sentences rather than the user’s. This allows it to create coherent paragraphs responding to users. This also means that although the AI responds coherently, it does not actually “know” anything. It only knows how to string the words together, pretending like it does.
History
ChatGPT is only one of the many AIs created by OpenAI. It was released on November 30th, 2022, using GPT-3.5, a slightly improved version of GPT-3. Its several improvements over its predecessors allowed it to process more complex prompts.
Its release shocked the public, as they could directly interact with AI, making them realize how advanced this technology has become. Only two months later, in January of the following year, it reached 100 million users at record-breaking speed. March brought along two critical events. On March 1st, OpenAI released ChatGPT API, allowing other developers to use this AI in their applications. Popular applications like Quizlet and Snapchat were some of the first to use this. The next was the release of GPT-4 on March 14th, although it’s currently only available to users via subscription. One notable improvement is that users can use images for prompts rather than only text, like in GPT-3.
Issues and Misuse
With it being so accessible and popular, ChatGPT has its share of misuse. A large-scale example is its usage in education. Students have started using it to write essays and complete assignments. In response, teachers have become stricter regarding plagiarism and even started using AI detection models to prevent cheating. However, these programs haven’t proven accurate, and students have been falsely accused.
This program has significantly impacted the development of AI, and between its changes to the education system and revolutionizing other applications, the entire world. But OpenAI’s creations don’t stop at chatbots.
DALL-E
DALL-E is another artificial intelligence from OpenAI. Instead of conversing with users, it takes user prompts and creates entirely new images. It’s even able to adapt to different art styles. In addition, it can build off of existing pictures, changing them or creating other similar photos.
How Does It Work?
As one might imagine, the intricacies of creating an AI at this level are complex to understand and explain. To start, DALL-E uses a diffusion model. This model is trained by first inputting an image. This image is then slowly distorted in steps until it’s unrecognizable. For an example, let’s say it takes 50 steps. Then, the AI attempts to recreate the original image by undoing the distortion. This wouldn’t be very accurate to the original. Then, distortion is readded to the recreation, but less than the actual distortion. Back to the example, it would add distortion equivalent to 49 steps. The process then repeats, and the image gets closer and closer to the original output. This is how it learns to create images. But then comes the leap from an image input into a text input.
For this, the developers used a software called CLIP. This model tracks the correlation between an image and a specific caption. This allows it to know the related imagery of each word.
It combines these two models to create an image. First, a completely random distortion is inputted in the diffusion model, and the user input text. The diffusion model slowly fixes the distortion. CLIP guides it, so each iteration of the loop makes it closer and closer to the user’s input. This eventually results in an image output.
Created using Bing Image Creator powered by DALLE-3. Prompt: Cooking a Lobster
Limitations
Despite its abilities, DALL-E still needs to improve. For example, this is a picture generated by DALLE-3. At first glance, it may seem normal, and frankly, I’d be fooled. But there are still mistakes. Take a look at the handles of the pot. There also appears to be an extra antenna along the left edge.
DALL-E struggles in specific areas. One thing it needs help with is positioning. For example, given the prompt “Red block on top of a blue block,” it struggles with differentiating which block is placed in what position. Instead, it often merges blocks into a single cube with both colors. Additionally, it also works with both spelling and counting. It can’t create signs or other text, often creating gibberish. When a specific number of objects is requested, such as “10 apples”, it usually displays a random number of apples. Even with its high-tech capabilities, it’s funny how it struggles with kindergarten tasks.
History
The minds behind DALL-E are once again from OpenAI. They derived DALL-E’s name from artist Salvador Dalí, combined with Pixar’s adorable robot, WALL-E. The first iteration of DALL-E was launched on January 5th, 2021. A year later, in April of 2022, DALL-E 2 was released, completely overhauled from the original. It uses the systems covered above. And most recently, only on September 13, 2023, was DALL-E 3 announced, promising to be more accurate and efficient than its predecessors. It also includes a better safety net, preventing AI from creating public figures and illegal, explicit, or discriminatory content. Usage must be paid for on the OpenAI website or for free with Bing Image Creator with a Microsoft account.
Controversies over AI Art
With its popularity, the public brings up concerns about its usage. To train AI, developers must use existing data, such as other people’s art and images. Due to the amount, developers look to the internet for training materials. Because of this, OpenAI has been accused of breaking Copyright laws and using people’s art without permission. However, this technology developed quickly, while laws regarding it did not. Developers do not have to be transparent in their training materials. This makes legal cases against the developers difficult.
Others argue that AI is simply doing what humans do. It takes inspiration from art posted publicly online and uses it to create something new. The output of the AI is entirely different from what’s put in, so what’s the harm?
With their newest addition to the roster, OpenAI has opened a way for artists to remove their works from the training material. While not the best solution, it is a step in the right direction.
Critics also argue that AI art devalues human artists’ work. If AI can create art just like people for free, why would anyone commission artists for work? Users have started to pass AI art as their work, with some comparing themselves to actual artists and others even profiting from this. For example, Jason Allen submitted an AI image for an art competition at the Colorado State Fair. He ended up winning a $300 prize. He believes he hasn’t cheated, but backlash from competitors and other artists say otherwise.
Current AI is generally still identifiable and has its limitations, but every day, it grows more and more evolved. Someday, it may be indistinguishable from genuine, human-made art.
The rise of AI has brought about a change in how we define art. But AI has brought about shifts in other places as well. Two more examples are the voice acting field and the world of cons and scams.
ElevenLabs
According to the application's description, ElevenLabs is one of the most popular and realistic Generative Voice AI tools created with artificial intelligence and machine learning for voiceovers and text-to-speech reading. According to the website, the most current versions as of September 2023 are English v1, Turbo v2, multilingual v1 (experimental), and multilingual v2. The multilingual v2 is in 28 unique languages, superior to the previous experimental multilingual v1. English v1 is the first model made by ElevenLabs, but it is also the most limited compared to its predecessor, Turbo v2; the latest, the least low-latency, and the best model for efficient real-time voice generation for English speech.
Usage
Upon registration on the website, you can have a generated API that gives you access to the ElevenLabs model based on your package. It lets you develop audio and integrate it into your Python application upon request. As a generative voice AI software, it takes users’ prompted audio files to voice clone or make custom voices with ‘pre-made voices’ and allows users to edit and mix the voice with the program’s speech synthesizer. It’s mainly used for content creation by generating and replicating voices in examples such as audiobooks, speeches, and other audio-related content with the voice. AI voice covers, and voice parodies are its most popular.
Popular gaming and coding YouTuber DougDoug streamed his use of ElevenLabs and ChatGPT to make an AI depiction of the character ‘Pajama Sam’ narrate and play his own point-click adventure game in real-time (Using an API key in Python).
Functionality
Unlike other AI programs that are text-based on the list so far, this program’s functionality can be a bit more unique with its generation of audio. For ElevenLabs, its purpose is to sound as realistic to the voice given as possible. So, the process starts by previously training the AI by letting it examine both multiple learning algorithms and data from real human voices unique to each model’s capabilities. One aspect of the AI’s functionality is its language model for producing emotion. This model is created by thoroughly analyzing these texts to comprehend the tone and emotion that can be conveyed in the audio. This is seen with its use of breathing in the middle of talking to sound human-like or its pauses during sentences and elongation of words for this display. Its knowledge base only grows with more user prompts and feedback given.
History
The London startup’s idea of generating voices for content came to childhood friends Staniszewski and Piotr Dabkowski watching poor dubbing of American movies in Poland. ElevenLabs initiated its first release around January 2023, and upon that release, it grossed 19 million dollars from investors.
Controversy
As most AI controversies go, users’ misuse of the voice-generating tool has had consequences for anyone whose voice is published online. Everything in this section is about ‘deepfakes’ when a discernible quality of a person is digitally altered to produce content, in cases of misuse, without the person’s permission. Voice actors have been vocal in their disapproval of using their voice on content they never consented to. Even you can be affected by this tool; scammers can use AI to clone your voice, threatening to wire money from your family and friends to who they think is you—similarly manipulating your voice to say controversial or harmful things to ruin your reputation. ElevenLabs has posted a thread encouraging users to report misuse of their program and their need for a paywall for the ability to track malicious content back to users.
Because of its ever-growing knowledge base, generative voice AI will only improve with the things we input, so it’s important when we get such a powerful tool to use it to its full capabilities. The proficiency and quality of the products ElevenLabs generates will undoubtedly contribute to the advancements of our technological world within things like voice recognition and content.
Siri
Siri is an AI owned by “Apple”. Siri is used across all IOS devices as a simple assistant. The purpose of Siri is to help people with everyday tasks. Siri is helpful for people when they need a simple question, a timer set for a set amount of time, and someone to call when their hands are unavailable.
Here is a picture of what Siri can do. As you can see, it can easily set up schedules, making it a great assistant.
Functionality/Usage
In summary, Siri uses machine learning and data from the user to know what the user wants. Siri records your voice's frequencies and sound waves and translates them into code. The data is then sent to a complex algorithm which sifts through thousands of combinations. After the code is sent, the phone now knows what to do. The code used to run Siri has yet to be publicly released, but here is an example of what it might look like.
The Controversy of the Siri
Even though Siri is a great AI, there are many privacy concerns. The main controversy was that Apple used Siri to spy on people. This was used to sniff out drug dealers and other criminal activity. But Apple never told anyone about this, which started a lawsuit still being settled. Other mobile assistants are facing backlash for similar problems.
History
Siri began as a research project at Stanford University’s Center for Computation and Natural Language in 2003. One goal was to create a virtual assistant that could help the user with everyday tasks. This research project was led by Dag Kittlaus, who worked at other companies such as Motorola, another mobile company similar to Apple. In 2007, Kittlaus and his coworkers and researchers founded the company “Siri Inc.” Later, Apple purchased this application in 2011, and the rest is history.
Takeaway
Science fiction is getting more and more real every day. AI is still in its infancy. But we’ve opened Pandora’s box. While AI can help people, it’s also being used in dangerous and harmful ways. AI can spread misinformation, listen to everything you say, and even replicate your voice. We have to stay guarded against it. Only after we are safe can we break the limits of what is possible.