The relationship among cloud computing, big data, and artificial intelligence

Today we talk about cloud computing, big data and artificial intelligence. These three words are now very hot and they seem to be related to each other.

Generally talking about cloud computing will mention big data, talk about artificial intelligence when it comes to big data, talk about artificial intelligence when it comes to cloud computing... I feel that the three are mutually reinforcing and inseparable.

However, if it is a non-technical person, it may be difficult to understand the relationship between the three, so it is necessary to explain.

The initial goal of cloud computing

We start with cloud computing. The initial goal of cloud computing is to manage resources. The main management areas are computing resources, network resources, and storage resources.

Manage data centers like computers

What is computing, network, storage resources?

For example, if you want to buy a laptop, do you have to care about what kind of CPU this computer is? How much memory? These two are what we call computational resources.

To access the Internet, this computer needs to have a network port that can be plugged in, or a wireless network card that can connect to our home router.

Your home also needs to open a network to operators such as China Unicom, mobile or telecommunications, such as 100M bandwidth. Then the master will get a network cable to your home. The master may help you to configure your router and their company's network connection.

In this way, all your computers, mobile phones and tablets can be accessed through your router. This is network resources.

You may also ask how big a hard disk? In the past, the hard disks were small and the size was 10G. Later, even the 500G, 1T, and 2T hard disks were not new. (1T is 1000G), this is the storage resource.

This is the same for a computer, and it is the same for a data center. Imagine you have a very large room, where you have a lot of servers. These servers also have CPUs, memory, and hard disks, and they also go online via router-like devices.

The question at this time is: How do people operating data centers manage these devices in a unified manner?

Flexibility is when you want to have everything you want.

The goal of management is to achieve two aspects of flexibility. Which two specific aspects?

For example, to understand: For example, someone needs a small computer with only one CPU, 1G RAM, 10G HDD, and 1MB of bandwidth. Can you give it to him?

Like a computer of such a small size, nowadays a random laptop is more powerful than this one, and it is 100M to randomly pull a broadband at home. However, if you go to a cloud computing platform and he wants this resource, there is only one point.

In this case it can achieve two aspects of flexibility:

Time flexibility: Whenever you want to, when you need it, it comes out.

Space flexibility: how much you want. Need a computer with a small space, can be satisfied; need a particularly large space such as cloud disk, the space allocated to everyone by the cloud disk is very large and very large, ready to upload at any time, never use endless, can also be satisfied of.

Space flexibility and time flexibility, which we often say about the elasticity of cloud computing. The solution to this problem of flexibility has experienced a long period of development.

Physical devices are not flexible

The first phase is the period of physical equipment. Customers need a computer during this period, and we will buy one in the data center.

The physical equipment is certainly getting more and more cattle:

For example, the server, memory is a hundred G memory.

For example, a network device can have tens or even hundreds of Gigabits of bandwidth on one port.

For example, storage is at least at the PB level in the data center (one P is 1000 T and one T is 1000 G).

However, physical devices do not have good flexibility:

The first is that it lacks time flexibility. I can't reach when I want to. For example, to buy a server, buy a computer, have to purchase time.

If a user suddenly tells a cloud vendor that he wants to open a computer and use a physical server, then it would be difficult to purchase it. A good relationship with a supplier may take a week, and a supplier-general relationship may require a purchase for a month.

The user waited a long time for the computer to be in place. At this time, the user also logged in to slowly start deploying his application. The time flexibility is very poor.

The second is that its spatial flexibility does not work either. For example, the user mentioned above needs a very small computer, but now there is such a small computer model? Can not buy a small machine to meet the user's only one G's memory, 80G hard drive.

However, if you buy a big one, you will need to pay more to the user because the computer is large. However, the user needs to use only a small amount of money, so it is very embarrassing to pay more.

Virtualization is much more flexible

Someone just thought of a solution. The first method is virtualization. Isn't the user just a small computer?

The physical equipment in the data center is very powerful. I can virtualize a small piece of it from the physical CPU, memory, and hard disk to the customer. At the same time, I can virtualize a small piece to other customers.

Each customer can only see their own small piece, but in fact each customer uses a small piece of the entire large device.

Virtualization technology makes different customers' computers appear to be isolated. That is, I looked at it like this piece of disk is mine, you look at this piece of disk is yours, but the actual situation may be my 10G and your 10G is on the same large and large storage.

And if the physical devices are all ready in advance, virtualizing the software out of a computer is very fast and can be solved in a matter of minutes. So to create a computer on any cloud, it came out in a few minutes. That is the reason.

This space flexibility and time flexibility are basically solved.

Making Money and Feelings in the Virtual World

In the virtualization phase, the most popular company is VMware. It is an early company to realize virtualization technology and can realize the virtualization of computing, network and storage.

This company is very good, the performance is very good, the virtualization software is selling very well, earning a lot of money, and then let EMC (the world's top 500, the first brand of storage vendors) to buy.

But there are still many people in this world who have feelings, especially programmers. How do people with feelings like to do things? Open source.

Many softwares in the world have closed source and open source. Source is source code. That is to say, a certain software does well, everyone loves to use, but the code of this software is closed by me, only my company knows, other people do not know.

If other people want to use this software, they must pay me. This is called closed source. However, there are always some people in the world who cannot afford to use their money to make a profit. Daniels think that this technique you will be able to me; you can develop it, and I can.

When I developed it, I didn't collect any money. I used the code to share it with everyone. Whoever uses the world can do it. All people can enjoy the benefits. This is called open source.

For example, the latest Tim Berners Lee is a very emotional person. In 2017, he won the 2016 Turing Award for “Inventing the World Wide Web, the First Browser, and the Basic Protocols and Algorithms that Make the World Wide Web Extensible.”

The Turing Award is the Nobel Prize in computer science. However, what he admires most is that he has made free use of the World Wide Web, which is our common WWW technology, free of charge.

We must thank him for all his actions on the Internet. If he uses this technology to collect money, he should be almost as rich as Bill Gates.

There are many examples of open source and closed source. For example, if you have Windows in a closed source world, you have to pay Microsoft for Windows. In the open source world, Linux appears.

Bill Gates made a lot of money by relying on Windows, Office, and other closed-source software. It was called the world’s richest man, and Daniel had developed another operating system, Linux.

Many people may not have heard of Linux. Many programs running on the back-end server are on Linux. For example, everyone enjoys the double eleven. Whether it is Taobao, Jingdong, Koala, etc. On Linux.

Again, Apple has Android. Apple's market value is very high, but we can't see the code of Apple's system. So Big Bull wrote Android operating system.

So everyone can see that almost all other mobile phone manufacturers have Android installed. The reason is that Apple's system is not open source, and everyone can use Android system.

In virtualization software too, with VMware, this software is very expensive. Then there are two open source virtualization software written by Daniel. One is called Xen and the other is called KVM. If you don't do technology, you can ignore these two names, but it will be mentioned later.

Virtualized semi-automatic and fully automated cloud computing

To say that virtualization software solves the problem of flexibility, it is not entirely correct. Because virtualization software generally creates a virtual computer, it is necessary to manually specify on which physical machine the virtual computer is placed.

This process may also require more complex manual configuration. Therefore, the use of VMware's virtualization software requires a very good certificate, and the person who can get this certificate has a very high salary, and its complexity is also visible.

Therefore, the cluster size of physical machines that can only be managed by virtualized software is not particularly large. Generally, it is in the range of dozens, dozens, and hundreds.

This aspect will affect the time flexibility: Although the time to virtualize a computer is very short, but with the expansion of the scale of the cluster, the process of manual configuration becomes more and more complex and more and more time-consuming.

On the other hand, it also influences space flexibility: When the number of users is large, the cluster size is still far from the extent to which it is desired. It is very likely that this resource will soon be used up and it will have to be purchased.

Therefore, with the increasing size of the clusters, basically thousands of companies have started, tens of thousands or even tens of millions. If you look at BAT, including Netease, Google, Amazon, the number of servers are all scary.

It is almost impossible for so many machines to rely on people to choose a place to put this virtualized computer and do the corresponding configuration, or need the machine to do this thing.

People have invented various algorithms to do this. The name of the algorithm is called the scheduler.

In layman's terms, there is a dispatch center. Thousands of machines are in a pool. No matter how many CPUs, memory, or hard disks the user needs, the dispatch center will automatically find a place in the big pool that can meet user needs. The virtual computer is started up and configured so that the user can use it directly.

This phase we call pooling or clouding. At this stage, it can be called cloud computing. Until then, it can only be called virtualization.

Cloud computing private and public

There are two types of cloud computing: one is a private cloud, and the other is a public cloud. Still others connect a private cloud with a public cloud as a hybrid cloud. For the time being, this is not the case.

Private Cloud: Deploy the virtualized and cloudized software in someone else's data center. Users using private cloud are often very rich, buy their own land to build a computer room, buy their own servers, and then let cloud vendors deploy themselves.

VMware, in addition to virtualization, has also launched cloud computing products and has earned a lot in the private cloud market.

Public cloud: With the virtualization and cloudization software deployed in the cloud vendor's own data center, users do not need to invest a lot. Just register an account and click on a web page to create a virtual computer.

For example, AWS is Amazon's public cloud; domestic Ali cloud, Tencent cloud, NetEase cloud and so on.

Why does Amazon do public clouds? We know that Amazon was originally an e-commerce company that is relatively large abroad. When it is doing e-commerce, it will surely encounter a scenario similar to double-eleven: at a certain moment everyone rushes to buy something.

When everyone is rushing to buy something, it needs special time flexibility and space flexibility. Because it can't always prepare all the resources, it is too wasteful. But also can not be prepared for nothing, looking at the double eleven so many users want to buy things can not board.

Therefore, when it takes two to eleven, it creates a large number of virtual computers to support the e-commerce application. After the double eleventh, these resources are released to do other things. So Amazon needs a cloud platform.

However, commercial virtualization software is really too expensive. Amazon cannot always give all the money it makes in e-commerce to virtualization vendors.

So Amazon based on the open source virtualization technology, as described above Xen or KVM, developed a set of their own cloud software. I did not expect Amazon's callers to become more and more cows, cloud platforms have become more and more cows.

Because its cloud platform needs to support its own e-commerce applications, while traditional cloud computing vendors are mostly IT vendors, almost no application of their own, so Amazon's cloud platform is more application-friendly, and quickly developed into the first brand of cloud computing Make a lot of money.

Before Amazon announced its cloud computing platform earnings report, people have speculated that Amazon e-commerce will make money and that the cloud will also make money. Later, when a financial report was released, it was found that it was not ordinary money making. Just last year, Amazon’s AWS revenue reached US$12.2 billion and operating profit was US$3.1 billion.

Cloud Computing Make Money and Feelings

The first place in the public cloud was awesome. The second place Rackspace was just too good. No way, this is the cruelty of the Internet industry, mostly winner-take-all mode. So if the second place is not the cloud computing industry, many people may not have heard of it.

The second place would like to think, I can not do but the boss how to do it? Open source it. As mentioned above, although Amazon uses open source virtualization technology, the cloudized code is closed source.

Many companies that want to do and can not do cloud platform can only look at Amazon to make big money. Rackspace made the source code public, and the entire industry could work together to make the platform work better. Brothers and all of you together, and fight with the boss.

So Rackspace and NASA co-founded open-source software OpenStack. As shown in the figure above, the OpenStack architecture diagram is not for the cloud computing industry to understand this picture.

But you can see three keywords: Compute Computing, Networking Network, Storage Storage. It is also a cloud computing management platform for computing, network, and storage.

Of course, the second technology is also awesome. With OpenStack, as Rackspace thinks, all the big companies that want to do the cloud are crazy, you can imagine all the big IT companies: IBM, Hewlett-Packard, Dell, Huawei, Lenovo, etc. are all crazy.

The original cloud platform everyone wanted to do, watching Amazon and VMware made so much money, looked at no way, think of themselves as if it is quite difficult.

Well now, with such an open source cloud platform OpenStack, all IT vendors have joined the community, contributed to this cloud platform, packaged into their own products, and sold along with their own hardware devices.

Some have made private clouds, some have made public clouds, and OpenStack has become the de facto standard for open source cloud platforms.

IaaS, resource level flexibility

As the technology of OpenStack becomes more and more mature, it can be managed more and more, and there can be multiple deployments of multiple OpenStack clusters.

For example, one deployment in Beijing, two deployments in Hangzhou, and one deployment in Guangzhou, followed by unified management. So the whole scale is bigger.

At this scale, for the perception of ordinary users, it is basically able to do when and how much they want.

Or take the cloud disk as an example. Each user's cloud disk is allocated 5T or more space. If there are 100 million people, how much space does it add?

In fact, the underlying mechanism is this: To allocate your space, you may use only a few of them. For example, it allocates 5 T to you. Such a large space is just what you see, not really. For you.

In fact, you only use 50 G, then 50 G are true to you. With the continuous uploading of your files, more and more space will be allocated to you.

When everyone uploads, when the cloud platform is found to be almost full (for example, 70%), it will purchase more servers and expand the resources behind it. This is transparent and invisible to users.

Sensually speaking, the flexibility of cloud computing is realized. In fact, it is a bit like a bank. It gives depositors the feeling of when they need to withdraw money. As long as they do not run at the same time, banks will not be embarrassed.

to sum up

At this stage, cloud computing basically achieves time flexibility and space flexibility; it achieves flexibility in computing, network, and storage resources.

Computing, networking, and storage are often referred to as infrastructure Infranstracture. Therefore, the flexibility at this stage is called resource-level flexibility.

The cloud platform for managing resources, which we call infrastructure services, is the IaaS (Infranstracture As A Service) that we often hear.

Cloud computing is not only a resource but also an application

With IaaS, is it sufficient to achieve resiliency at the resource level? Obviously not, there is application-level flexibility.

Here's an example: For example, to implement an e-commerce application, ten machines are enough for the time being, and a double 11 needs 100 units. You may think it's easy to do. With IaaS, it will be possible to create 90 new machines.

However, the creation of 90 machines was empty, and the e-commerce application was not put up. Only the company's operation and maintenance personnel were able to get one at a time, which took a long time to install.

Although flexibility has been achieved at the resource level, there is no flexibility at the application level and flexibility is still not enough. Is there a way to solve this problem?

People added a layer on top of the IaaS platform to manage the elasticity of applications above the resources. This layer is usually called the Platform As A Service (PaaS).

This layer is often more difficult to understand, roughly divided into two parts: some of the author called "your own application automatically installed", part of the author called "universal application does not install."

Automatic installation of your own applications: For example, e-commerce applications are developed by you. Other than yourself, others do not know how to install them.

For e-commerce applications, the account of Alipay or WeChat needs to be configured during installation so that when someone else buys something from your e-commerce provider, the money is paid into your account. No one but you knows.

So the installation process platform can not help, but it can help you to do automation, you need to do some work, to integrate their own configuration information into the automated installation process.

For example, in the above example, the 90 new machines created by Double 11 are empty. If a tool can be provided to automatically install the e-commerce application on the new 90 machines, it will be able to achieve real flexibility at the application level. .

For example Puppet, Chef, Ansible, Cloud Foundary can do this thing, the latest container technology Docker can do this better.

The general application does not need to install: The so-called general application, generally refers to some high complexity, but everybody uses, for example the database. Almost all applications will use a database, but the database software is standard. Although installation and maintenance are more complicated, no matter who installs it, it is the same.

Such an application can become a standard PaaS layer application on the cloud platform interface. When the user needs a database, one point comes out and the user can use it directly.

Someone asked me, since whoever installs them is the same, then I've come to myself and I don't need to spend money to buy on the cloud platform. Of course not, the database is a very difficult thing, Oracle this company, can rely on the database to make so much money. Buying Oracle also costs a lot of money.

However, most cloud platforms will provide open source databases such as MySQL, which are open source. Money does not need to spend so much.

However, maintaining this database requires a large team of experts. If the database can be optimized to support Double Eleven, it is not a year or two.

For example, if you are a cyclist, you certainly don't need to recruit a very large database team to do it. The cost is too high. You should give it to the cloud platform to do this.

Professional people do professional things, and the cloud platform has specialized in maintaining hundreds of people to maintain this system. You just need to focus on your cycling application.

Either auto-deployment or no-deployment. In general, you have to worry about the application layer. This is the important role of the PaaS layer.

Although the script approach can solve the deployment problem of its own application, different environments are different. A script often runs correctly in one environment and is not correct in another environment.

The container is better able to solve this problem.

The container is a Container, Container another container, in fact, the idea of ​​the container is to become a container for software delivery. The characteristics of the container: First, the package, the second is the standard.

In the absence of containers, it was assumed that the goods would be shipped from A to B, with three terminals and three ships in the middle.

Every time the goods are unloaded, they are dropped, and then they are put on the boat and they are neatly arranged. Therefore, in the absence of a container, the crew must wait a few days on the shore before each change.

With containers, all the goods are packaged together, and the dimensions of the containers are all the same, so every time you change a ship, one box moves over the line, the hour level can be completed, and the crew no longer have to land ashore for long delays Now.

This is the application of two major characteristics of container "encapsulation" and "standard" in life.

How does the container package the application? Still have to learn containers. First of all, there must be a closed environment where the goods are packaged so that the goods do not interfere with each other and are isolated from one another. This makes loading and unloading convenient. Fortunately, LXC technology in Ubuntu has long been able to do this.

The closed environment mainly uses two technologies:

The technology that appears to be isolated is called a Namespace, which means that each application in the Namespace sees a different IP address, user space, and process number.

It is isolated technology, called Cgroups, which means that the entire machine has a lot of CPU and memory, and an application can only use a part of it.

The so-called mirror image is the moment you weld the container and save the state of the container. Just like the Monkey King said, “fixed”, the container is set at that moment, and then the state of the moment is saved as a series of documents.

The format of these files is standard, and anyone who sees these files can restore the time that they were staying. The process of restoring the image to the runtime (that is, reading the image file and restoring the time) is the process of the container running.

With containers, the automatic deployment of PaaS layers to the user's own applications becomes fast and elegant.

Big Data Embraces Cloud Computing

A complex general-purpose application in the PaaS layer is the big data platform. How is big data integrated into cloud computing step by step?

The data is not large but also contains wisdom

This big data is not big at the beginning. How much data did you have originally? Now everyone goes to the e-book and reads the news on the Internet. When we were young in 80s, the amount of information was not so great. Then we read books and read newspapers. How many words did a week's newspaper add up?

If you are not in a big city, the library of an ordinary school does not add up to several bookshelves. Later, with the arrival of information, there will be more and more information.

First of all, let's look at the data in big data. There are three types:

Structured data: data with a fixed format and limited length. For example, the completed form is structured data. Nationality: People’s Republic of China, Ethnicity: Han, Sex: Male. These are called structured data.

Unstructured data: More and more unstructured data is now available. It is data with variable length and no fixed format, such as web pages. Sometimes it is very long, and sometimes a few words are gone; for example, voice and video are not. Structured data.

Semi-structured data: It is in some XML or HTML format. It may not be technically unfamiliar but it does not matter.

In fact, the data itself is not useful and must go through some processing. For example, if you run with a bracelet every day, you collect data. So many web pages on the Internet are also data. We call it Data.

The data itself is useless, but the data contains a very important thing called information.

The data is very messy and can only be called information after combing and cleaning. Information can contain many laws. We need to sum up the laws from information, which is called knowledge, and knowledge changes fate.

There is a lot of information, but some people see the information is equivalent to the white look, but some people see the future of e-commerce from the information. Some people see the future of live broadcasting, so people are cattle.

If you do not extract knowledge from the information, watching friends circles every day can only be a spectator in the tide of the Internet.

With knowledge, and then using this knowledge to apply to actual combat, some people will do very well. This thing is called Intelligence.

Knowledge does not necessarily have wisdom. For example, many scholars are very knowledgeable. The things that have happened can be analyzed from various angles. However, if you do it, you can't turn it into wisdom.

The reason why many entrepreneurs are great is to apply their knowledge to practice and finally do a great deal of business.

So the application of data is divided into these four steps: data, information, knowledge, and wisdom.

The final stage is what many businesses want. You see that I have collected so much data. Can I use this data to help me make the next decision and improve my product?

For example, when a user watches a video, an advertisement pops up next to him, which is exactly what he wants to buy. Another example is when the user listens to music, he recommends other music that he would like to hear.

The user clicks a little mouse on my application or website. The input text is data for me. I just want to extract some of them, guide practice, and form wisdom, so that users can't get out of my application. On my network, I don't want to leave. I keep on hand and keep buying.

Many people said that I would like to break my net for the double eleventh. My wife bought and bought it on the top, bought A and recommended B, and his wife said, “Oh, B is what I like, my husband wants to buy it.”

How do you say this program is so wise, so wise, knows my wife better than I am, how did this thing happen?

How data sublimates into wisdom

The processing of data is divided into the following steps. When it is completed, there will be wisdom in the end:

data collection

data transmission

data storage

Data Processing and Analysis

Data retrieval and mining

data collection

First of all, there is data. There are two ways to collect data:

Take, professional point of view is called grabbing or crawling. For example, a search engine does this: it downloads all the information on the Internet to its data center, and then you search it out.

For example, when you search, the result will be a list. Why is this list in the search engine company? It's because he took the data down, but you clicked on it, and clicking out of this site is no longer a search engine for the company.

For example, there is a Sina news, you get Baidu search out, when you do not point, that page in the Baidu data center, a little out of the page is in the Sina data center.

Push, there are many terminals that can help me collect data. For example, the millet bracelet can upload your daily running data, heart rate data, and sleep data to the data center.

data transmission

This is usually done through a queue because the amount of data is really too large and the data must be processed before it can be useful. But the system can not handle it, but had to queue up and slowly handle it.

data storage

Now that data is money, mastering the data is equivalent to having money. How else does the website know what you want to buy?

Because it has your historical transaction data, this information can not be given to others, it is very valuable, so it needs to be stored.

Data Processing and Analysis

The data stored above is the original data, the original data is mostly disorderly, there are a lot of garbage data in it, and therefore need to be cleaned and filtered to get some high-quality data.

For high-quality data, analysis can be performed to classify the data, or discover the interrelationships between the data and gain knowledge.

For example, the story of beer and diapers in the rumored Wal-Mart supermarket is to analyze people’s purchase data and discover that when men generally buy diapers, they will also buy beer.

In this way, the relationship between beer and diapers was discovered, knowledge was gained, and then applied to practice. The beer and diaper counters were brought close to each other and wisdom was gained.

Data retrieval and mining

Search is search, so-called foreign affairs asked Google, asked Baidu internal problems. Both internal and external search engines put the analyzed data into the search engine, so when people want to search for information, they will have a search.

The other is excavation. Searching alone can no longer meet people's needs. It also requires excavating mutual relations from information.

Such as financial search, when searching for a company's stock, the company's executives should also be excavated?

If you only search out the company's stock and find it particularly good, then you buy it. At that time, its executives issued a statement that is very unfavorable to the stock and fell the next day. Does this not harm the general public? Therefore, it is very important to mine the relationships in data through various algorithms to form a knowledge base.

In the era of big data, everyone picks up firewood

When the amount of data is small, few machines can solve it. Slowly, when the amount of data is getting bigger and bigger, and the best server can't solve the problem, what should we do?

At this time, we must aggregate the power of multiple machines. We must all work together to get this thing together.

For data collection: As far as IoT is concerned, there are thousands of testing equipment deployed outside and a large amount of data such as temperature, humidity, monitoring, and power are collected; for the search engines of Internet web pages, the entire Internet needs to be All web pages are downloaded.

This obviously can't be done by a single machine. It requires multiple machines to form a web crawler system. Each machine downloads a part of it and works at the same time. Only in a limited amount of time can a huge amount of web pages be downloaded.

For the transmission of data: a queue in the memory will certainly be a lot of data burst out, so there is a hard disk-based distributed queue, so that the queue can be transmitted at the same time on more than one machine, depending on how much data you have, as long as my queue More than enough, the pipeline is thick enough to hold it.

For data storage: a machine's file system is certainly not fit, so you need a large distributed file system to do this thing, the hard disk of multiple machines into a large file system.

For the analysis of the data: It may be necessary to decompose, count, and summarize a large amount of data. One machine will surely be indefinite, and it will not be able to analyze the month of the monkey.

Then there is the distributed computing method, which divides a large amount of data into small portions. Each machine handles a small portion, and multiple machines are processed in parallel and can be calculated quickly.

For example, the well-known Terasort sorts one TB of data, which is equivalent to 1000G. If it is processed in a single machine, it will take several hours, but the parallel processing is completed in 209 seconds.

So what is big data? To put it plainly, one machine is not finished, and everyone does it together.

However, as the amount of data becomes larger and larger, many small companies need to deal with a considerable amount of data. What can these small companies do without such a large number of machines?

Big data needs cloud computing, cloud computing needs big data

Having said this, everyone thought of cloud computing. When you want to do these activities, you need a lot of machines to do it. When you really want to think about what you want and how much you want.

For example, the financial status of a big data analytics company may be analyzed once a week. If you want to put this hundred machines or one thousand machines all together, once a week is very wasteful.

Can it be necessary to calculate the time to take out the thousands of machines? When not counting, let the thousand machines do other things?

Who can do this thing? Only cloud computing can provide resource-level flexibility for big data operations.

Cloud computing also deploys big data to its PaaS platform as a very important universal application.

Because the big data platform can make multiple machines do things together, this thing is not something that the average person can develop, nor is the average person having fun. How can we have to hire a few dozen hundred people to play this.

So just like a database, you still need a bunch of professional people to play this thing. There is basically a big data solution on the public cloud.

When a small company needs a big data platform, it does not need to purchase a thousand machines. As long as it is on the public cloud, the thousands of machines are all out, and the big data platform has already been deployed. Just put the data into it. It's OK.

Cloud computing needs big data and big data needs cloud computing. The two are combined like this.

Artificial Intelligence Embraces Big Data

When will the machine understand the human heart?

Although there is big data, people's desires cannot be satisfied. Although there is a search engine in the big data platform, what you want to search for comes out.

但也存在这样的情况:我想要的东西不会搜,表达不出来,搜索出来的又不是我想要的。

例如音乐软件推荐了一首歌,这首歌我没听过,当然不知道名字,也没法搜。但是软件推荐给我,我的确喜欢,这就是搜索做不到的事情。

当人们使用这种应用时,会发现机器知道我想要什么,而不是说当我想要时,去机器里面搜索。这个机器真像我的朋友一样懂我,这就有点人工智能的意思了。

人们很早就在想这个事情了。最早的时候,人们想象,要是有一堵墙,墙后面是个机器,我给它说话,它就给我回应。

如果我感觉不出它那边是人还是机器,那它就真的是一个人工智能的东西了。

让机器学会推理

怎么才能做到这一点呢?人们就想:我首先要告诉计算机人类推理的能力。你看人重要的是什么?人和动物的区别在什么?就是能推理。

要是把我这个推理的能力告诉机器,让机器根据你的提问,推理出相应的回答,这样多好?

其实目前人们慢慢地让机器能够做到一些推理了,例如证明数学公式。这是一个非常让人惊喜的一个过程,机器竟然能够证明数学公式。

但慢慢又发现这个结果也没有那么令人惊喜。因为大家发现了一个问题:数学公式非常严谨,推理过程也非常严谨,而且数学公式很容易拿机器来进行表达,程序也相对容易表达。

然而人类的语言就没这么简单了。比如今天晚上,你和你女朋友约会,你女朋友说:如果你早来,我没来,你等着;如果我早来,你没来,你等着!

这个机器就比较难理解了,但人都懂。所以你和女朋友约会,是不敢迟到的。

教给机器知识

因此,仅仅告诉机器严格的推理是不够的,还要告诉机器一些知识。但告诉机器知识这个事情,一般人可能就做不来了。可能专家可以,比如语言领域的专家或者财经领域的专家。

语言领域和财经领域知识能不能表示成像数学公式一样稍微严格点呢?例如语言专家可能会总结出主谓宾定状补这些语法规则,主语后面一定是谓语,谓语后面一定是宾语,将这些总结出来,并严格表达出来不就行了吗?

后来发现这个不行,太难总结了,语言表达千变万化。就拿主谓宾的例子,很多时候在口语里面就省略了谓语,别人问:你谁啊?我回答:我刘超。

但你不能规定在语音语义识别时,要求对着机器说标准的书面语,这样还是不够智能,就像罗永浩在一次演讲中说的那样,每次对着手机,用书面语说:请帮我呼叫某某某,这是一件很尴尬的事情。

人工智能这个阶段叫做专家系统。专家系统不易成功,一方面是知识比较难总结,另一方面总结出来的知识难以教给计算机。

因为你自己还迷迷糊糊,觉得似乎有规律,就是说不出来,又怎么能够通过编程教给计算机呢?

算了,教不会你自己学吧

于是人们想到:机器是和人完全不一样的物种,干脆让机器自己学习好了。

机器怎么学习呢?既然机器的统计能力这么强,基于统计学习,一定能从大量的数字中发现一定的规律。

其实在娱乐圈有很好的一个例子,可窥一斑:

有一位网友统计了知名歌手在大陆发行的9 张专辑中117 首歌曲的歌词,同一词语在一首歌出现只算一次,形容词、名词和动词的前十名如下表所示(词语后面的数字是出现的次数):

如果我们随便写一串数字,然后按照数位依次在形容词、名词和动词中取出一个词,连在一起会怎么样呢?

例如取圆周率3.1415926,对应的词语是:坚强,路,飞,自由,雨,埋,迷惘。

稍微连接和润色一下:

坚强的孩子

依然前行在路上

张开翅膀飞向自由

让雨水埋葬他的迷惘

是不是有点感觉了?当然,真正基于统计的学习算法比这个简单的统计复杂得多。

然而统计学习比较容易理解简单的相关性:例如一个词和另一个词总是一起出现,两个词应该有关系;而无法表达复杂的相关性。

并且统计方法的公式往往非常复杂,为了简化计算,常常做出各种独立性的假设,来降低公式的计算难度,然而现实生活中,具有独立性的事件是相对较少的。

模拟大脑的工作方式

于是人类开始从机器的世界,反思人类的世界是怎么工作的。

人类的脑子里面不是存储着大量的规则,也不是记录着大量的统计数据,而是通过神经元的触发实现的。

每个神经元有从其他神经元的输入,当接收到输入时,会产生一个输出来刺激其他神经元。于是大量的神经元相互反应,最终形成各种输出的结果。

例如当人们看到美女瞳孔会放大,绝不是大脑根据身材比例进行规则判断,也不是将人生中看过的所有的美女都统计一遍,而是神经元从视网膜触发到大脑再回到瞳孔。

在这个过程中,其实很难总结出每个神经元对最终的结果起到了哪些作用,反正就是起作用了。

于是人们开始用一个数学单元模拟神经元。

这个神经元有输入,有输出,输入和输出之间通过一个公式来表示,输入根据重要程度不同(权重),影响着输出。

于是将n 个神经元通过像一张神经网络一样连接在一起。n 这个数字可以很大很大,所有的神经元可以分成很多列,每一列很多个排列起来。

每个神经元对于输入的权重可以都不相同,从而每个神经元的公式也不相同。当人们从这张网络中输入一个东西的时候,希望输出一个对人类来讲正确的结果。

例如上面的例子,输入一个写着2 的图片,输出的列表里面第二个数字最大,其实从机器来讲,它既不知道输入的这个图片写的是2,也不知道输出的这一系列数字的意义,没关系,人知道意义就可以了。

正如对于神经元来说,他们既不知道视网膜看到的是美女,也不知道瞳孔放大是为了看的清楚,反正看到美女,瞳孔放大了,就可以了。

对于任何一张神经网络,谁也不敢保证输入是2,输出一定是第二个数字最大,要保证这个结果,需要训练和学习。

毕竟看到美女而瞳孔放大也是人类很多年进化的结果。学习的过程就是,输入大量的图片,如果结果不是想要的结果,则进行调整。

如何调整呢?就是每个神经元的每个权重都向目标进行微调,由于神经元和权重实在是太多了,所以整张网络产生的结果很难表现出非此即彼的结果,而是向着结果微微地进步,最终能够达到目标结果。

当然,这些调整的策略还是非常有技巧的,需要算法的高手来仔细的调整。正如人类见到美女,瞳孔一开始没有放大到能看清楚,于是美女跟别人跑了,下次学习的结果是瞳孔放大一点点,而不是放大鼻孔。

没道理但做得到

听起来也没有那么有道理,但的确能做到,就是这么任性!

神经网络的普遍性定理是这样说的,假设某个人给你某种复杂奇特的函数,f(x):

不管这个函数是什么样的,总会确保有个神经网络能够对任何可能的输入x,其值f(x)(或者某个能够准确的近似)是神经网络的输出。

如果在函数代表着规律,也意味着这个规律无论多么奇妙,多么不能理解,都是能通过大量的神经元,通过大量权重的调整,表示出来的。

人工智能的经济学解释

这让我想到了经济学,于是比较容易理解了。

我们把每个神经元当成社会中从事经济活动的个体。于是神经网络相当于整个经济社会,每个神经元对于社会的输入,都有权重的调整,做出相应的输出。

比如工资涨了、菜价涨了、股票跌了,我应该怎么办、怎么花自己的钱。这里面没有规律么?肯定有,但是具体什么规律呢?很难说清楚。

基于专家系统的经济属于计划经济。整个经济规律的表示不希望通过每个经济个体的独立决策表现出来,而是希望通过专家的高屋建瓴和远见卓识总结出来。但专家永远不可能知道哪个城市的哪个街道缺少一个卖甜豆腐脑的。

于是专家说应该产多少钢铁、产多少馒头,往往距离人民生活的真正需求有较大的差距,就算整个计划书写个几百页,也无法表达隐藏在人民生活中的小规律。

基于统计的宏观调控就靠谱多了,每年统计局都会统计整个社会的就业率、通胀率、GDP 等指标。这些指标往往代表着很多内在规律,虽然不能精确表达,但是相对靠谱。

然而基于统计的规律总结表达相对比较粗糙。比如经济学家看到这些统计数据,可以总结出长期来看房价是涨还是跌、股票长期来看是涨还是跌。

如果经济总体上扬,房价和股票应该都是涨的。但基于统计数据,无法总结出股票,物价的微小波动规律。

基于神经网络的微观经济学才是对整个经济规律最最准确的表达,每个人对于自己在社会中的输入进行各自的调整,并且调整同样会作为输入反馈到社会中。

想象一下股市行情细微的波动曲线,正是每个独立的个体各自不断交易的结果,没有统一的规律可循。

而每个人根据整个社会的输入进行独立决策,当某些因素经过多次训练,也会形成宏观上统计性的规律,这也就是宏观经济学所能看到的。

例如每次货币大量发行,最后房价都会上涨,多次训练后,人们也就都学会了。

人工智能需要大数据

然而,神经网络包含这么多的节点,每个节点又包含非常多的参数,整个参数量实在是太大了,需要的计算量实在太大。

但没有关系,我们有大数据平台,可以汇聚多台机器的力量一起来计算,就能在有限的时间内得到想要的结果。

人工智能可以做的事情非常多,例如可以鉴别垃圾邮件、鉴别黄色暴力文字和图片等。

这也是经历了三个阶段的:

依赖于关键词黑白名单和过滤技术,包含哪些词就是黄色或者暴力的文字。随着这个网络语言越来越多,词也不断地变化,不断地更新这个词库就有点顾不过来。

基于一些新的算法,比如说贝叶斯过滤等,你不用管贝叶斯算法是什么,但是这个名字你应该听过,这是一个基于概率的算法。

基于大数据和人工智能,进行更加精准的用户画像、文本理解和图像理解。

由于人工智能算法多是依赖于大量的数据的,这些数据往往需要面向某个特定的领域(例如电商,邮箱)进行长期的积累。

如果没有数据,就算有人工智能算法也白搭,所以人工智能程序很少像前面的IaaS 和PaaS 一样,将人工智能程序给某个客户安装一套,让客户去用。

因为给某个客户单独安装一套,客户没有相关的数据做训练,结果往往是很差的。

但云计算厂商往往是积累了大量数据的,于是就在云计算厂商里面安装一套,暴露一个服务接口。

比如您想鉴别一个文本是不是涉及黄色和暴力,直接用这个在线服务就可以了。这种形势的服务,在云计算里面称为软件即服务,SaaS (Software AS A Service)

于是工智能程序作为SaaS 平台进入了云计算。

基于三者关系的美好生活

终于云计算的三兄弟凑齐了,分别是IaaS、PaaS 和SaaS。所以一般在一个云计算平台上,云、大数据、人工智能都能找得到。

一个大数据公司,积累了大量的数据,会使用一些人工智能的算法提供一些服务;一个人工智能公司,也不可能没有大数据平台支撑。

所以,当云计算、大数据、人工智能这样整合起来,便完成了相遇、相识、相知的过程。

Bitmain Antminer Asic Miner

Bitmain Antminer Asic Miner:Bitmain Antminer Z15,Bitmain Antminer Z9 Mini,Bitmain Antminer Z9,Bitmain Antminer Z11


Bitmain is the world's leading digital currency mining machine manufacturer. Its brand ANTMINER has maintained a long-term technological and market dominance in the industry, with customers covering more than 100 countries and regions. The company has subsidiaries in China, the United States, Singapore, Malaysia, Kazakhstan and other places.


Bitmain has a unique computing power efficiency ratio technology to provide the global blockchain network with outstanding computing power infrastructure and solutions. Since its establishment in 2013, ANTMINER BTC mining machine single computing power has increased by three orders of magnitude, while computing power efficiency ratio has decreased by two orders of magnitude. Bitmain's vision is to make the digital world a better place for mankind.

Bitmain Antminer Asic Miner,Z15 bitmain antminer,Z11 Antminer Bitmain,zcash miner,zec mining machine

Shenzhen YLHM Technology Co., Ltd. , https://www.asicminer-ylhm.com

Posted on