Hi readers, you must have been wondering to know how important data, its type, and the quality is? and must also be thinking where this data is coming from, how it is being stored and where it is being used? Ok.
Please keep in mind that I am writing these blog to spread awareness especially among those who are familiar with the terms but not with the actual subject which is important to understand what is happening around them in order to avoid to be fooled
So, the question is, where the data comes from?
Simple. It is being pouring in from all sides round the clock. For example:
- Streaming data comes from the Internet of Things (IoT) which is a physical object or group of objects equipped with sensors, processing ability, software, and other technologies that can connect and exchange data with other devices and systems over the internet. The data is also coming from other communication networks and connected devices that flow into IT systems or computers equipped with specialized software such as devices working for precision, smart, and digital agriculture., telematics., articles of clothing., smart cars., medical devices., and industrial equipment etc., which can be immediately analyzed, and decision can be made which data is required now, which is to be discarded and which is to be kept for future use?
- Social media also sends data into computers that is based on Facebook interaction, UTube and Instagram stories etc., and comprised videos, images, voice, and text messages and sounds that are useful for marketing, sales, and support function,
- Publicly available data comes from public sector organization’s official portals and other open data sources,
- Cloud sources, i.e., data uploaded from external sources into the system, which can be downloaded, analyzed, and used for report writing (like I am writing this blog), and/or suppliers who may use it to promote their business,
- Big data comes from centralized repository which stores all structured and unstructured data at any scale and in its original forms, Thus, there are numerous dimensions of the data that even an expert “data scientist” cannot comprehend precisely and handle it effectively.
How big data is accessed, managed, and stored?
Access, manage, and storage of data is extremely important because future decision making is dependent on analyses of this data. Today’s state of the art computing system possesses this ability to quickly access huge amount and type of data which can be stored in traditional data:
- Warehouse: a type of data management system that is designed to enable and support business intelligence (BI) activities, especially analytics,
- Cloud: Cloud data warehouse has no physical hardware. It’s a software used as a service.
- Data lakes: a system or repository of data stored in its natural/raw format, usually object blobs or files.
- Hadoop: an open-source framework that is used to store and process large datasets ranging in size from gigabytes to petabytes.
Data can easily be accessed from these warehouses, and used for various purposes such as Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), training AI “Bots” for data analytics and storing the data into archive for latter hoovering as and when customized bots (Internet robot or simply bot, is a software application that runs automated tasks over the internet, usually with the intent to match human activity, such as messaging) becomes available. It is anticipated that by 2050, most of the big data would be “real time” of which 95% will come from IoT. This data is continuously being used to generate stronger insight for ML and NLP (which are the basis for AI), norms of business are transforming accordingly.
What is Artificial intelligence (AI)?
AI is a theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages, or the “science and engineering of making intelligent machines” to automate tasks requiring intelligent behavior for example,
- Control systems,
- Automated planning and scheduling,
- Ability to answer diagnostic and consumer questions, and
- Handwriting, speech, and facial recognition.
This way, it can be called an engineering discipline, focused on providing solutions to real-life problems, software application, traditional strategy games and various video games. The capability of a machine to imitate intelligent human behavior was anticipated to be used as tool for replacing human involvement in analyses and decision making not because of lack of trust in humans being but, in their capabilities, to handle and analyze huge data set which could be beyond their comprehension.
Artificial Intelligence is exhibited by machines that recognize its environment and takes actions that maximize its chance of successfully achieving its goal thus, it is usually used to describe computers that mimic and/or imitate cognitive functions or the ways in which humans process information and think about the world which includes learning, thinking, knowing, remembering, judging, and problem-solving which encompasses the process associated with language, imagination, perception, and planning.
AI techniques Include:
Knowledge representation and reasoning (KR&R) which is a field dedicated to representing information about the world in a form that a computer can utilize to solve complex tasks such as diagnosing a medical condition or having a dialog in natural language.
Automated planning and scheduling, or simply AI planning is a branch that deals with realization of strategies, or action sequences typically required for execution by intelligent agents, autonomous robots, or unmanned vehicles.
Natural language processing (NLP) is an application that provide customer service on a webpage. It deals with interactions between computer and human (natural) languages, particularly how to program computer to process and analyze large amount of natural language data, and
Computer vision (CV) is an analog of NLP and used for digital conversion such as video, speech recognition, and expert systems to simulate the judgment.
Today’s state of the art in AI includes:
Machine learning (ML) which is a study of computer algorithm that improves automatically through experience.
Machine learning algorithms build mathematical models based on sample data known as training data to make predictions or decisions without being explicitly programmed to do so (this is where the danger is mostly anticipated).
It is an application of AI that enables system to automatically learn and improve from experience and focuses on the development of computer program that can access data and use it to learn for themselves. Machine learning algorithms are used in wide variety of applications, such as email filtering and computer vision where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
Cognitive computing (CC) is the simulation of human thoughts processes in a computerized model. Cognitive computing involves self-learning systems that use data mining, pattern recognition and natural language processing to mimic the way the human brain works.
Robotic process automation (RPA) enables organization to configure computers software or a “bot” to capture and interpret existing application for processing a transection, manipulating data, triggering response, and communicating with other digital systems.
Pattern recognition is a program that automatically compare pattern stored in the memory of machine with the pattern that the machine sees.
For example, a vision program may try to match a pattern of eyes and a nose in a scene to find a face.
More complex patterns, e.g., in a natural language text, in a chess position, or in the history of some events are also studied. These complex patterns require different methods than do the simple patterns that have been studied the most.
Big data and Artificial Intelligence
As mentioned earlier, Big Data can be utilized only if a useful insight can be found into this data and Artificial Intelligence is the only technique that can do this job not only for big but other ordinary, training and/or trivial data too which is beyond the human comprehension and capacity to do. This is one of the reasons that big data has developed special relationship with AI. Previously, fixed set of directions were given to AI to follow which has now been replaced with “machine learning” technique that is now being used to train AI chatbot that collect and uses large data sets to make sense out of it.
For example, AI chatbots can be trained on data sets that contain text recordings of human conversation collected from messenger apps to learn how to understand what humans say and require, and to come up with appropriate responses.
This is just one example and there are millions of such chatbots gathering huge amount of data from all over the world though various apps. This data is a raw material for AI algorithms and a central pillar on which AI is based and is being powered to learn and understand what is happening around the globe and prepare for appropriate responses.
AI encoding is based on
- Learning which means acquiring data and its conversion into algorithms for turning the data into actionable information.,
- Reasoning, which is based on selecting the right algorithm to achieve the desired results., and
- Self-correction which is based on continuous modification and/or adjustment in the design of algorithms so that the most accurate result can be achieved.
AI uses data to learn as it is directed by the software.
The question is “what if this data is poisoned”?
the hackers and cyber security broacher can tamper with the data set (software) used to train AI “chatbots” and direct them to do the way they like?
See you next time with an amazing blog on this topic.
Till then enjoy reading this blog and pl. do comment if something is missing and/or not clear?
Bye