Merge pull request #2 from scutan90/master

upate
This commit is contained in:
Suseike 2018-11-09 22:56:02 +08:00 committed by GitHub
commit cfdae8c1e2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
18 changed files with 1026 additions and 134 deletions

674
LICENSE Normal file
View File

@ -0,0 +1,674 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.

View File

@ -1,6 +1,6 @@
# 1. 版权声明
请尊重作者的知识产权,版权所有,翻版必究。
请大家一起维护自己的劳动成果,进行监督。
请尊重作者的知识产权,版权所有,翻版必究。 未经许可,严禁转发内容!
请大家一起维护自己的劳动成果,进行监督。 未经许可, 严禁转发内容!    
2018.6.27 TanJiyong
# 2. 概述
@ -63,7 +63,7 @@
1. 寻求有愿意继续完善的朋友、编辑、写手; 如有意合作,完善出书(成为共同作者)。
所有提交内容的贡献者,将会在文中体现贡献者个人信息(大佬-西湖大学)。
2. 联系方式 : 请联系scutjy2015@163.com (唯一官方邮箱) 微信Tan tan_weixin88
2. 联系方式 : 请联系scutjy2015@163.com (唯一官方邮箱) 微信Tan
(进群先在MD版本增加、改善、提交内容后更易进群享受分享知识帮助他人。)

View File

@ -107,3 +107,37 @@ Reference:
1. 所以端到端深度学习的弊端之一是它把可能有用的人工设计的组件排除在外了,精心设计的人工组件可能非常有用,但它们也有可能真的伤害到你的算法表现。
例如,强制你的算法以音位为单位思考,也许让算法自己找到更好的表示方法更好。
所以这是一把双刃剑,可能有坏处,可能有好处,但往往好处更多,手工设计的组件往往在训练集更小的时候帮助更大。
## 10.8 什么是负迁移?产生负迁移的原因有哪些?(中科院计算所-王晋东)
负迁移(Negative Transfer)指的是,在源域上学习到的知识,对于目标域上的学习产生负面作用。
产生负迁移的原因主要有:
- 数据问题:源域和目标域压根不相似,谈何迁移?
- 方法问题:源域和目标域是相似的,但是,迁移学习方法不够好,没找到可迁移的成分。
负迁移给迁移学习的研究和应用带来了负面影响。在实际应用中,找到合理的相似性,并且选择或开发合理的迁移学习方法,能够避免负迁移现象。
## 10.9 迁移学习的基本问题有哪些?(中科院计算所-王晋东)
基本问题主要有3个
- How to transfer: 如何进行迁移学习?(设计迁移方法)
- What to transfer: 给定一个目标领域,如何找到相对应的源领域,然后进行迁移?(源领域选择)
- When to transfer: 什么时候可以进行迁移,什么时候不可以?(避免负迁移)
## 10.10 迁移学习的基本思路?(中科院计算所-王晋东)
迁移学习的总体思路可以概括为:开发算法来最大限度地利用有标注的领域的知识,来辅助目标领域的知识获取和学习。
迁移学习的核心是,找到源领域和目标领域之间的相似性,并加以合理利用。这种相似性非常普遍。比如,不同人的身体构造是相似的;自行车和摩托车的骑行方式是相似的;国际象棋和中国象棋是相似的;羽毛球和网球的打球方式是相似的。这种相似性也可以理解为不变量。以不变应万变,才能立于不败之地。
有了这种相似性后,下一步工作就是, 如何度量和利用这种相似性。度量工作的目标有两点:一是很好地度量两个领域的相似性,不仅定性地告诉我们它们是否相似,更定量地给出相似程度。二是以度量为准则,通过我们所要采用的学习手段,增大两个领域之间的相似性,从而完成迁移学习。
**一句话总结: 相似性是核心,度量准则是重要手段。**
## 10.11 为什么需要迁移学习?(中科院计算所-王晋东)
1. 大数据与少标注的矛盾:虽然有大量的数据,但往往都是没有标注的,无法训练机器学习模型。人工进行数据标定太耗时。
2. 大数据与弱计算的矛盾:普通人无法拥有庞大的数据量与计算资源。因此需要借助于模型的迁移。
3. 普适化模型与个性化需求的矛盾:即使是在同一个任务上,一个模型也往往难以满足每个人的个性化需求,比如特定的隐私设置。这就需要在不同人之间做模型的适配。
4. 特定应用(如冷启动)的需求。

View File

@ -1,24 +1,31 @@
10.1 网络搭建有什么原则?
10.1.1新手原则。
# 第十二章 网络搭建及训练。
## 10.1 网络搭建有什么原则?
### 10.1.1新手原则。
刚入门的新手不建议直接上来就开始搭建网络模型。比较建议的学习顺序如下:
- 1.了解神经网络工作原理,熟悉基本概念及术语。
- 2.阅读经典网络模型论文+实现源码(深度学习框架视自己情况而定)。
- 3.找数据集动手跑一个网络,可以尝试更改已有的网络模型结构。
- 4.根据自己的项目需要设计网络。
10.1.2深度优先原则。
通常增加网络深度可以提高准确率,但同时会牺牲一些速度和内存。
### 10.1.2深度优先原则。
通常增加网络深度可以提高准确率,但同时会牺牲一些速度和内存。但深度不是盲目堆起来的,一定要在浅层网络有一定效果的基础上,增加深度。深度增加是为了增加模型的准确率,如果浅层都学不到东西,深了也没效果。
### 10.1.3卷积核size一般为奇数。
10.1.3卷积核size一般为奇数。
卷积核为奇数有以下好处:
- 1 保证锚点刚好在中间,方便以 central pixel为标准进行滑动卷积避免了位置信息发生偏移 。
- 2 保证在填充Padding在图像之间添加额外的零层图像的两边仍然对称。
10.1.4卷积核不是越大越好。
### 10.1.4卷积核不是越大越好。
AlexNet中用到了一些非常大的卷积核比如11×11、5×5卷积核之前人们的观念是卷积核越大感受野越大看到的图片信息越多因此获得的特征越好。但是大的卷积核会导致计算量的暴增不利于模型深度的增加计算性能也会降低。于是在VGG、Inception网络中利用2个3×3卷积核的组合比1个5×5卷积核的效果更佳同时参数量3×3×2+1=19<26=5×5×1+1被降低因此后来3×3卷积核被广泛应用在各种模型中
10.2 有哪些经典的网络模型值得我们去学习的?
## 10.2 有哪些经典的网络模型值得我们去学习的?
提起经典的网络模型就不得不提起计算机视觉领域的经典比赛ILSVRC .其全称是 ImageNet Large Scale Visual Recognition Challenge.正是因为ILSVRC 2012挑战赛上的AlexNet横空出世使得全球范围内掀起了一波深度学习热潮。这一年也被称作“深度学习元年”。而在历年ILSVRC比赛中每次刷新比赛记录的那些神经网络也成为了人们心中的经典成为学术界与工业届竞相学习与复现的对象并在此基础上展开新的研究。
@ -40,7 +47,7 @@ AlexNet中用到了一些非常大的卷积核比如11×11、5×5卷积核
>- 2 VGGNet
论文:[Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
代码实现:[tensorflow]https://github.com/tensorflow/tensorflow/blob/361a82d73a50a800510674b3aaa20e4845e56434/tensorflow/contrib/slim/python/slim/nets/vgg.py)
代码实现:[tensorflow](https://github.com/tensorflow/tensorflow/blob/361a82d73a50a800510674b3aaa20e4845e56434/tensorflow/contrib/slim/python/slim/nets/vgg.py)
主要特点:
>> - 1.网络结构更深。
>> - 2.普遍使用小卷积核。
@ -64,35 +71,58 @@ AlexNet中用到了一些非常大的卷积核比如11×11、5×5卷积核
>> 提出了feature recalibration通过引入 attention 重新加权,可以得到抑制无效特征,提升有效特征的权重,并很容易地和现有网络结合,提升现有网络性能,而计算量不会增加太多。
**CV领域网络结构演进历程**
![CV领域网络结构演进历程](http://wx2.sinaimg.cn/mw690/005B3ViFly1fwthh0jw58j30q80aldgw.jpg)
![CV领域网络结构演进历程](./img/ch12/网络结构演进.png)
**ILSVRC挑战赛历年冠军:**
![ILSVRC挑战赛历年冠军](http://wx4.sinaimg.cn/mw690/005B3ViFly1fwswhzquw2j31810or78b.jpg)
![ILSVRC挑战赛历年冠军](./img/ch12/历年冠军.png)
此后ILSVRC挑战赛的名次一直是衡量一个研究机构或企业技术水平的重要标尺。
ILSVRC 2017 已是最后一届举办.2018年起将由WebVision竞赛Challenge on Visual Understanding by Learning from Web Data来接棒。因此即使ILSVRC挑战赛停办了但其对深度学习的深远影响和巨大贡献将永载史册。
10.3 网络训练有哪些技巧吗?
10.3.1.合适的数据集。
## 10.3 网络训练有哪些技巧吗?
### 10.3.1.合适的数据集。
- 1 没有明显脏数据(可以极大避免Loss输出为NaN)。
- 2 样本数据分布均匀。
10.3.2.合适的预处理方法。
### 10.3.2.合适的预处理方法。
关于数据预处理在Batch Normalization未出现之前预处理的主要做法是减去均值然后除去方差。在Batch Normalization出现之后减均值除方差的做法已经没有必要了。对应的预处理方法主要是数据筛查、数据增强等。
10.3.3.网络的初始化。
### 10.3.3.网络的初始化。
网络初始化最粗暴的做法是参数赋值为全0这是绝对不可取的。因为如果所有的参数都是0那么所有神经元的输出都将是相同的那在back propagation的时候同一层内所有神经元的行为也是相同的这可能会直接导致模型失效无法收敛。吴恩达视频中介绍的方法是将网络权重初始化均值为0、方差为1符合的正态分布的随机数据。
10.3.4.小规模数据试练。
### 10.3.4.小规模数据试练。
在正式开始训练之前,可以先用小规模数据进行试练。原因如下:
- 1 可以验证自己的训练流程对否。
- 2 可以观察收敛速度,帮助调整学习速率。
- 3 查看GPU显存占用情况最大化batch_size(前提是进行了batch normalization只要显卡不爆尽量挑大的)。
10.3.5.设置合理Learning Rate。
### 10.3.5.设置合理Learning Rate。
- 1 太大。Loss爆炸、输出NaN等。
- 2 太小。收敛速度过慢,训练时长大大延长。
- 3 可变的学习速率。比如当输出准确率到达某个阈值后可以让Learning Rate减半继续训练。
### 10.3.6.损失函数
损失函数主要分为两大类:分类损失和回归损失
>1.回归损失:
>> - 1 均方误差(MSE 二次损失 L2损失)
它是我们的目标变量与预测值变量差值平方。
>> - 2 平均绝对误差(MAE L1损失)
它是我们的目标变量与预测值变量差值绝对值。
>关于MSE与MAE的比较。MSE更容易解决问题但是MAE对于异常值更加鲁棒。更多关于MAE和MSE的性能可以参考[L1vs.L2 Loss Function](https://rishy.github.io/ml/2015/07/28/l1-vs-l2-loss/)
>2.分类损失:
>> - 1 交叉熵损失函数。
是目前神经网络中最常用的分类目标损失函数。
>> - 2 合页损失函数
>>合页损失函数广泛在支持向量机中使用,有时也会在损失函数中使用。缺点:合页损失函数是对错误越大的样本施以更严重的惩罚,但是这样会导致损失函数对噪声敏感。

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

View File

@ -77,19 +77,67 @@ Nvidia有面向个人用户例如GTX系列和企业用户例如Tesla系
Nvidia一般每一两年发布一次新版本的GPU例如2017年发布的是GTX 1000系列。每个系列中会有数个不同的型号分别对应不同的性能。
## 15.6 软件环境搭建
深度学习其实就是指基于一套完整的软件系统来构建算法,训练模型如何搭建一套完整的软件系统,比如操作系统的选择?安装环境中遇到的问题等等,本节做一个简单的总结。
深度学习其实就是指基于一套完整的软件系统来构建算法,训练模型如何搭建一套完整的软件系统,比如操作系统的选择?安装环境中遇到的问题等等,本节做一个简单的总结。
### 15.6.1 操作系统选择?
针对硬件厂商来说比如NVIDIA对各个操作系统的支持都是比较好的 比如windows10,linux系列但是由于linux系统对专业技术人员是非常友好的所以目前几乎所有的深度学习系统构建都是基于linux的比较常用的系统如ubuuntu系列centos系列等等。
针对硬件厂商来说比如NVIDIA对各个操作系统的支持都是比较好的 比如windows系列,linux系列但是由于linux系统对专业技术人员比较友好所以目前几乎所有的深度学习系统构建都是基于linux的比较常用的系统如ubuuntu系列centos系列等等。
在构建系统的时候,如何选择合适的操作系是一个刚刚入门深度学习的工作者面临的问题,在这里给出几点建议:
1刚刚入门熟悉windows系统但是对linux和深度学习都不太熟这个时候可以基于windows10等系统来做入门学习
2简单了解linux的使用不太懂深度学习相关知识可以直接基于linux系统来搭建框架跑一些开源的项目慢慢研究
3熟悉linux毫无疑问强烈推荐使用linux系统安装软件简单工作效率高
1刚刚入门熟悉windows系统但是对linux和深度学习都不太熟这个时候可以基于windows系列系统来做入门学习
2简单了解linux的使用不太懂深度学习相关知识可以直接基于linux系统来搭建框架跑一些开源的项目慢慢深入研究学习
3熟悉linux不熟悉深度学习理论,毫无疑问强烈推荐使用linux系统安装软件简单工作效率高
总之一句话如果不熟悉linux就先慢慢熟悉最终还是要回归到linux系统来构建深度学习系统
### 15.6.2 本机安装还是使用docker
### 15.6.2 常用基础软件安装?
目前有众多深度学习框架可供大家使用但是所有框架基本都有一个共同的特点目前几乎都是基于nvidia的gpu来训练模型要想更好的使用nvidia的gpucuda和cudnn就是必备的软件安装。
1安装cuda
上文中有关于cuda的介绍这里只是简单介绍基于linux系统安装cuda的具体步骤可以根据自己的需要安装cuda8.0或者cuda9.0这两种版本的安装步骤基本一致这里以最常用的ubuntu 16.04 lts版本为例
1官网下载地址
cuda8.0 https://developer.nvidia.com/cuda-80-ga2-download-archive
cuda9.0 https://developer.nvidia.com/cuda-90-download-archive
进入网址之后选择对应的系统版本即可,如下图所示:
![cuda8.0](./img/ch15/cuda8.0.png)
### 15.6.3 GPU驱动问题
![cuda9.0](./img/ch15/cuda9.0.png)
2命令行中进入到cuda所在的位置授予运行权限
cuda8.0: sudo chmod +x cuda_8.0.61_375.26_linux.run
cuda9.0: sudo chmod +x cuda_9.0.176_384.81_linux.run
3执行命令安装cuda
cuda8.0: sudo sh cuda_8.0.61_375.26_linux.run
cuda9.0: sudo sh cuda_9.0.176_384.81_linux.run
之后命令之后下面就是安装步骤cuda8.0和cuda9.0几乎一致:
1 首先出现cuda软件的版权说明可以直接按q键跳过阅读
2 Do you accept the previously read EULA?
accept/decline/quit: **accept**
3 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: **no**
4 Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: **yes**
5 Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]: 直接按enter键即可
6 Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: **yes**
7 Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: **yes**
以上步骤基本就是cuda的安装步骤。
2安装cudnn
cudnn是nvidia的专门针对深度学习的加速库。。。
### 15.6.3 本机安装还是使用docker
### 15.6.4 GPU驱动问题
## 15.7 框架选择
@ -102,6 +150,13 @@ Nvidia一般每一两年发布一次新版本的GPU例如2017年发布的是G
* Tensorflow
* PyTorch
pytorch是Facebook于2017年才推出的深度学习框架相对于其它框架算是比较晚的了但是这个同时也是优势在设计的时候就会避免很多之前框架的问题所以一经推出就收到大家极大的欢迎
优点接口简洁且规范和python无缝结合代码设计优秀且易懂社区非常活跃官方修复bug及时
缺点: 目前模型在工业界部署相对其它框架稍有劣势不过后续的pytorch1.0版本应该会有很大改善和caffe2合并后caffe2的优秀的模型部署能力可以弥补这个不足
相关资源链接:
1官网教程https://pytorch.org/tutorials/
2基于pytorch的开源项目汇总https://github.com/bharathgs/Awesome-pytorch-list
3
* Keras
@ -234,9 +289,7 @@ keras是一种高层编程接口其可以选择不同的后端比如tensor
缺点: 封装的太好了导致不理解其技术细节
3,pytorch:
pytorch是Facebook于2017年才推出的深度学习框架相对于其它框架算是比较晚的了但是这个同时也是优势在设计的时候就会避免很多之前框架的问题所以一经推出就收到大家极大的欢迎
优点接口简洁且规范和python无缝结合代码设计优秀且易懂社区非常活跃官方修复bug及时
缺点: 目前模型在工业界部署相对其它框架稍有劣势不过后续的pytorch1.0版本应该会有很大改善和caffe2合并后caffe2的优秀的模型部署能力可以弥补这个不足
4,caffe2:
caffe2是在caffe之后的第二代版本同属于Facebook。。。

View File

@ -159,7 +159,7 @@ $$
右边的三个矩阵相乘的结果将会是一个接近于$A$的矩阵,在这儿,$r$越接近于$n$,则相乘的结果越接近于$A$。
## 1.10 机器学习为什么要使用概率?
事件的概率是衡量该时间发生的可能性的量度。虽然在一次随机试验中某个事件的发生是带有偶然性的,但那些可在相同条件下大量重复的随机试验却往往呈现出明显的数量规律。
事件的概率是衡量该事件发生的可能性的量度。虽然在一次随机试验中某个事件的发生是带有偶然性的,但那些可在相同条件下大量重复的随机试验却往往呈现出明显的数量规律。
机器学习除了处理不确定量,也需处理随机量。不确定性和随机性可能来自多个方面,使用概率论来量化不确定性。
概率论在机器学习中扮演着一个核心角色,因为机器学习算法的设计通常依赖于对数据的概率假设。

View File

@ -24,6 +24,9 @@ modify_log---->用来记录修改日志
1. 删除4.6.34.8.4
2. 修改4.3问题答案,添加部分论文链接
<----qjhuang-2018-11-9---->
1. 修改部分书写错误
其他---->待增加
2. 修改readme内容
3. 修改modify内容

View File

@ -421,7 +421,7 @@ figure 6' 17/8之间的特征图尺寸缩小
![](./img/ch4/image47.png)
&nbsp;&nbsp;&nbsp;&nbsp;
Inception v4 中的Inception模块分别为Inception A Inception B Inception C
Inception v4 中的Inception模块分别为Inception A Inception B Inception C
![](./img/ch4/image48.png)
@ -466,8 +466,6 @@ Inception-ResNet-v2中的reduction模块分别为reduction A reduction B
![](./img/ch4/image63.png)
# 4.8 ResNet及其变体
&nbsp;&nbsp;&nbsp;&nbsp;
http://www.sohu.com/a/157818653_390227
&nbsp;&nbsp;&nbsp;&nbsp;
自从AlexNet在LSVRC2012分类比赛中取得胜利之后深度残差网络Deep Residual Network可以说成为过去几年中在计算机视觉、深度学习社区领域中最具突破性的成果了。ResNet可以实现高达数百甚至数千个层的训练且仍能获得超赞的性能。

View File

@ -1,33 +1,40 @@
# 第五章 卷积神经网络CNN
# 第五章 卷积神经网络CNN
标签(空格分隔): 深度学习
---
Markdown Revision 1;
Date: 2018/11/08
Editor: 李骁丹-杜克大学
Contact: xiaodan.li@duke.edu
## 5.1 卷积神经网络的组成层
在卷积神经网络中有3种最主要的层
> * 卷积运算层
> * 池化层
> * 全连接层
1. 卷积神经网络主要应用于具有网格拓扑结构的数据集。
2. 卷积神经网络在语音识别,图像处理以及人脸识别方面有了广泛的应用并取得了巨大的成功。
3. 卷积神经网络主要包含3种层
> * 卷积运算层:该层利用过滤器对输入特征进行卷积化以获得本地特征。本地特征构成特征图(feature map)。
> * 池化层: 池化操作包括取最值,取和以及取平均。池化操作将作用于特征图(feature map)。
> * 全连接层:基于池化层结果,全连接层将进行将分类预测等操作。
一个完整的神经网络就是由这三种层叠加组成的。
**结构示例**
拿CIFAR-10数据集举例一个典型的该数据集上的卷积神经网络分类器应该有[INPUT - CONV - RELU - POOL - FC]的结构,
拿CIFAR-10数据集举例一个典型的卷积神经网络分类器应该有[INPUT - CONV - RELU - POOL - FC]的结构,
> * INPUT[32\*32\*3]包含原始图片数据中的全部像素长宽都是32有RGB 3个颜色通道。
> * CONV卷积层中每个神经元会和上一层的若干小区域连接计算权重和小区域像素的内积举个例子可能产出的结果数据是[32\*32\*12]的。
> * CONV卷积层中每个神经元会和上一层的若干小区域连接计算权重和小区域像素的内积比如结果数据为[32\*32\*12]的。
> * RELU层就是神经元激励层主要的计算就是max(0,x),结果数据依旧是[32\*32\*12]。
> * POOLing层做的事情,可以理解成一个下采样,可能得到的结果维度就变为[16\*16\*12]了
> * 全连接层一般用于最后计算类别得分,得到的结果为[1\*1\*10]其中的10对应10个不同的类别。和名字一样这一层的所有神经元会和上一层的所有神经元连接。
> * POOLing层操作可以理解成一个下采样,得到的结果维度就变为[16\*16\*12]
> * 全连接层一般用于最后计算类别得分,得到的结果为[1\*1\*10]其中的10对应10个不同的类别。该层的所有神经元会和上一层的所有神经元连接。
这样,卷积神经网络作为一个中间的通道,就一步步把原始的图像数据转成最后的类别得分了。有一个点我们要提一下,刚才说到了有几种不同的神经网络层,其中有一些层是有待训练参数的,另外一些没有。详细一点说,卷积层和全连接层包含权重和偏移而RELU和POOLing层只是一个固定的函数运算是不包含权重和偏移参数的。不过POOLing层包含了我们手动指定的超参数,这个我们之后会提到
这样,卷积神经网络作为一个中间通道,就一步步把原始图像数据转成最后的类别得分。有一点我们要提一下,刚才提及的几种不同神经网络层,其中有一些层是需要进行参数训练,另外一些不需要。详细一点说卷积层和全连接层包含权重和偏移而RELU和POOLing层只是一个固定的函数运算是不包含权重和偏移参数的。不过POOLing层包含了我们手动指定的超参数。
**总结一下**
* 1一个卷积神经网络由多种不同类型的层(卷积层/RELU层/POOLing层/全连接层等)叠加而成。
* 2每一层的输入结构是3维的数据计算完输出依旧是3维的数据。
* 3卷积层和全连接层包含训练参数RELU 和 POOLing 层不包含。
* 4卷积层全连接层和 POOLing 层包含超参数RELU 层没有。
* 一个卷积神经网络由多种不同类型的层(卷积层/RELU层/POOLing层/全连接层等)叠加而成。
* 每一层的输入结构是3维的数据计算完输出依旧是3维的数据。
* 卷积层和全连接层包含训练参数但是RELU 和 POOLing 层不包含。
* 卷积层,全连接层和 POOLing 层包含超参数RELU 层没有。
下图为 CIFAR-10 数据集构建的一个卷积神经网络结构示意图:
@ -35,7 +42,7 @@
## 5.2 卷积如何检测边缘信息?
卷积运算是卷积神经网络最基本的组成部分,神经网络的前几层首先检测边缘,然后,后面的层有可能检测到物体的部分区域,更靠后的一些层可能检测到完整的物体。
卷积运算是卷积神经网络最基本的组成部分。神经网络的前几层首先检测边缘,接下来的层有可能检测到物体的部分区域,更靠后的一些层可能检测到完整的物体。
先介绍一个概念,过滤器:
![image](./img/ch5/img2.png)
@ -50,37 +57,37 @@
![image](./img/ch5/img4.png)
我们看一下发生了什么事把过滤器最准图像左上方3*3的范围逐一相乘并相加,得到-5。
如图深蓝色区域所示过滤器在图像左上方3*3的范围内逐一加权相加,得到-5。
同理将过滤器右移进行相同操作再下移直到过滤器对准图像右下角最后一格。依次运算得到一个4*4的矩阵。
OK了解了过滤器以及卷积运算后,让我们看看为何过滤器能检测物体边缘:
了解了过滤器以及卷积运算后,让我们看看为何过滤器能检测物体边缘:
举一个最简单的例子:
![image](./img/ch5/img5.png)
这张图片如上所示,左半边全是白的,右半边全是灰的,过滤器还是用之前那个,把他们进行卷积
这张图片如上所示,左半边全是白的,右半边全是灰的,我们仍然使用之前的过滤器,对该图片进行卷积处理
![image](./img/ch5/img6.png)
可以看到最终得到的结果中间是一段白色两边为灰色于是垂直边缘被找到了。为什么呢因为在6*6图像中红框标出来的部分也就是图像中的分界线所在部分与过滤器进行卷积结果是30。而在不是分界线的所有部分进行卷积结果都为0.
在这个图中白色的分界线很粗那是因为6*6的图像实在太小了若是换成1000*1000的图像我们会发现在最终结果中分界线不粗且很明显。
在这个图中白色的分界线很粗那是因为6\*6的图像尺寸过小对于1000\*1000的图像我们会发现在最终结果中分界线较细但很明显。
这就是检测物体垂直边缘的例子水平边缘的话只需将过滤器旋转90度。
## 5.3 卷积的几个基本定义?
首先,我们需要定义一个卷积层的几个参数达成一致
首先,我们需要定义卷积层参数。
### 5.3.1 卷积核大小
Kernel Size: 卷积核的大小定义了卷积的视图范围。二维的常见选择大小是3即3×3像素。
Kernel Size: 卷积核大小定义了卷积的视图范围。针对二维,通常选择大小是3即3×3像素。
### 5.3.2 卷积核的步长
Stride: Stride定义了内核的步长。虽然它的默认值通常为1但我们可以将步长设置为2然后对类似于MaxPooling的图像进行向下采样。
Stride: Stride定义了卷积核步长。虽然它的默认值通常为1但我们可以将步长设置为2然后对类似于MaxPooling的图像进行向下采样。
### 5.3.3 边缘填充
@ -88,40 +95,41 @@ OK了解了过滤器以及卷积运算后让我们看看为何过滤器能
### 5.3.4 输入和输出通道
一个卷积层接受一定数量的输入通道(I),并计算一个特定数量的输出通道(O)这一层所需的参数可以由I*O*K计算K等于卷积核中值的数量。
一个卷积层接受一定数量的输入通道(I),并计算一个特定数量的输出通道(O)这一层所需的参数可以由I\*O\*K计算K等于卷积核中值的数量。
## 5.4 卷积的网络类型分类?
### 5.4.1 普通卷积
普通卷积如下图所示。
![image](./img/ch5/img7.png)
### 5.4.2 扩张卷积
又名空洞atrous卷积扩张的卷积引入了另一个被称为扩张率dilation rate的卷积层。定义了卷积核中值之间的间隔。一个3×3卷积核的扩张率为2它的视图与5×5卷积核相同而只使用9个参数。想象一下,取一个5×5卷积核每两行或两列删除一行或一列。
又名空洞atrous卷积扩张的卷积引入了另一个被称为扩张率dilation rate的卷积层。扩张率定义了卷积核中值之间的间隔。一个3×3卷积核的扩张率为2它的视图与5×5卷积核相同而只使用9个参数。一个5×5卷积核每两行或两列删除一行或一列。
这将以同样的计算代价提供更广阔的视角。扩张的卷积在实时分割领域特别受欢迎。如果需要广泛的视图,并且不能负担多个卷积或更大的卷积核,那么就使用它们
这将以同样的计算代价提供更广阔的视角。扩张的卷积在实时分割领域特别受欢迎。如果需要广泛视图,但不能负担较多个卷积或更大的卷积核,往往会使用扩张卷积
举例:
![image](./img/ch5/img8.png)
### 5.4.3 转置卷积
转置卷积也就是反卷积deconvolution。虽然有些人经常直接叫它反卷积,但严格意义上讲是不合适的,因为它不符合一个反卷积的概念。反卷积确实存在,但它们在深度学习领域并不常见。一个实际的反卷积会恢复卷积的过程。想象一下,将一个图像放入一个卷积层中。现在把输出传递到一个黑盒子里,然后你的原始图像会再次出来。这个黑盒子就完成了一个反卷积。这是一个卷积层的数学逆过程。
转置卷积也称反卷积deconvolution。虽然有些人经常直接叫其反卷积,但严格意义上讲是不合适的,因为它不符合反卷积的基本概念。反卷积确实存在,但它们在深度学习领域并不常见。一个实际的反卷积会恢复卷积的过程。将一个图像放入一个卷积层中。接着把输出传输到一个黑盒子里,最后该黑盒将输出原始图像。这个黑盒子就完成了一个反卷积。其是卷积层的数学逆过程。
一个转置的卷积在某种程度上是相似的,因为它产生的相同的空间分辨率是一个假设的反卷积层。然而,在值上执行的实际数学操作是不同的。一个转置卷积层执行一个常规的卷积但是它会恢复它的空间变换spatial transformation
转置卷积在某种程度上与反卷积类似,因为它在相同的空间分辨率下,是一个假设的反卷积层。然而,在数值上执行的实际数学操作是不同的。一个转置卷积层执行一个常规的卷积但是它会恢复它的空间变换spatial transformation
在这一点上,你应该非常困惑,让我们来看一个具体的例子:
5×5的图像被馈送到一个卷积层。步长设置为2无边界填充而卷积核是3×3。结果得到了2×2的图像。
如果我们想要逆转这个过程我们需要反向的数学运算以便从我们输入的每个像素中生成9个值。然后,我们将步长设置为2来遍历输出图像。这就是一个反卷积过程。
如果想要逆转这个过程我们需要反向的数学运算以便从我们输入的每个像素中生成9个值。我们将步长设置为2来遍历输出图像。这就是一个反卷积过程。
![image](./img/ch5/img9.png)
一个转置的卷积并不会这样做。唯一的共同点是它保证输出将是一个5×5的图像同时仍然执行正常的卷积运算。为了实现这一点,我们需要在输入上执行一些奇特的填充。
一个转置的卷积做法与此不同。唯一的共同点是它的输出仍然是一个5×5的图像但其实际上执行的是正常卷积运算。为了实现这一点,我们需要在输入上执行一些奇特的填充。
正如你现在所能想象的,这一步不会逆转上面的过程。至少不考虑数值。
因此这一步不会逆转上面的过程。至少不考虑数值。
它仅仅是重新构造了之前的空间分辨率并进行了卷积运算。这可能不是数学上的逆过程,但是对于编码-解码器Encoder-Decoder架构来说这仍然是非常有用的。这样我们就可以把图像的尺度上推upscaling和卷积结合起来而不是做两个分离的过程。
它仅仅是重新构造了之前的空间分辨率并进行了卷积运算。这不是数学上的逆过程,但是对于编码-解码器Encoder-Decoder架构来说这仍然是非常有用的。这样我们就可以把图像的尺度上推upscaling和卷积结合起来而不是做两个分离的过程。
如果我们想反转这个过程我们需要反数学运算以便从我们输入的每个像素中生成9个值。之后我们以2步幅的设置来遍历输出图像。这将是一个反卷积。

View File

@ -1,5 +1,10 @@
# 第六章 循环神经网络(RNN)
Markdown Revision 2;
Date: 2018/11/07
Editor: 李骁丹-杜克大学
Contact: xiaodan.li@duke.edu
Markdown Revision 1;
Date: 2018/10/26
Editor: 杨国峰-中国农业科学院
@ -9,14 +14,17 @@ http://blog.csdn.net/heyongluoyao8/article/details/48636251
## 6.1 RNNs和FNNs有什么区别
不同于传统的前馈神经网络(FNNs)RNNs引入了定向循环能够处理那些输入之间前后关联的问题。**定向循环结构如下图所示**
1. 不同于传统的前馈神经网络(FNNs)RNNs引入了定向循环能够处理输入之间前后关联问题。
2. RNNs可以记忆之前步骤的训练信息。
**定向循环结构如下图所示**
![](./img/ch6/figure_6.1_1.jpg)
## 6.2 RNNs典型特点
RNNs的目的使用来处理序列数据。在传统的神经网络模型中是从输入层到隐含层再到输出层层与层之间是全连接的每层之间的节点是无连接的。但是这种普通的神经网络对于很多问题却无能无力。例如你要预测句子的下一个单词是什么一般需要用到前面的单词因为一个句子中前后单词并不是独立的。
RNNs之所以称为循环神经网路即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中即隐藏层之间的节点不再无连接而是有连接的并且隐藏层的输入不仅包括输入层的输出还包括上一时刻隐藏层的输出。理论上RNNs能够对任何长度的序列数据进行处理。但是在实践中为了降低复杂性往往假设当前的状态只与前面的几个状态相关**下图便是一个典型的RNNs**
1. RNNs主要用于处理序列数据。对于传统神经网络模型从输入层到隐含层再到输出层层与层之间一般为全连接每层之间神经元是无连接的。但是传统神经网络无法处理数据间的前后关联问题。例如为了预测句子的下一个单词一般需要该词之前的语义信息。这是因为一个句子中前后单词是存在语义联系的。
2. RNNs中当前单元的输出与之前步骤输出也有关因此称之为循环神经网络。具体的表现形式为当前单元cell会对之前步骤信息进行储存并应用于当前输出的计算中。隐藏层之间的节点连接起来隐藏层当前输出由当前时刻输入向量和之前时刻隐藏层状态共同决定。
3. 理论上RNNs能够对任何长度序列数据进行处理。但是在实践中为了降低复杂度往往假设当前的状态只与之前某几个时刻状态相关**下图便是一个典型的RNNs**
![](./img/ch6/figure_6.2_1.png)
@ -30,21 +38,22 @@ RNNs之所以称为循环神经网路即一个序列当前的输出与前面
**图中信息传递特点:**
1. 有一条单向流动的信息流是从输入单元到达隐藏单元;
2. 与此同时,另一条单向流动的信息流从隐藏单元到达输出单元;
3. 在某些情况下RNNs会打破后者的限制引导信息从输出单元返回隐藏单元这些被称为“Back Projections”
4. 在某些情况下,隐藏层的输入还包括上一隐藏层的状态,即隐藏层内的节点可以自连也可以互连。
1. 一条单向流动的信息流是从输入单元到隐藏单元。
2. 一条单向流动的信息流从隐藏单元到输出单元。
3. 在某些情况下RNNs会打破后者的限制引导信息从输出单元返回隐藏单元这些被称为“Back Projections”。
4. 在某些情况下,隐藏层的输入还包括上一时刻隐藏层的状态,即隐藏层内的节点可以自连也可以互连。
5. 当前单元cell输出是由当前时刻输入和上一时刻隐藏层状态共同决定。
## 6.3 RNNs能干什么
RNNs已经被在实践中证明对NLP是非常成功的。如词向量表达、语句合法性检查、词性标注等。在RNNs中目前使用最广泛最成功的模型便是LSTMs(Long Short-Term Memory长短时记忆模型)模型,该模型通常比 vanilla RNNs 能够更好地对长短时依赖进行表达该模型相对于一般的RNNs只是在隐藏层做了改变
RNNs在自然语言处理领域取得了巨大成功,如词向量表达、语句合法性检查、词性标注等。在RNNs及其变型目前使用最广泛最成功的模型是LSTMs(Long Short-Term Memory长短时记忆模型)模型,该模型相比于RNNs能够更好地对长短时依赖进行描述
## 6.4 RNNs在NLP中典型应用
**1语言模型与文本生成(Language Modeling and Generating Text)**
一个单词序列,需要根据前面的单词预测每一个单词的可能性。语言模型能够一个语句正确的可能性,这是机器翻译的一部分,往往可能性越大,语句越正确。另一种应用便是使用生成模型预测下一个单词的概率,从而生成新的文本根据输出概率的采样
定一组单词序列,需要根据前面单词预测每个单词出现的可能性。语言模型能够评估某个语句正确的可能性,可能性越大,语句越正确。另一种应用便是使用生成模型预测下一个单词的出现概率,从而利用输出概率的采样生成新的文本
**2机器翻译(Machine Translation)**
@ -52,21 +61,22 @@ RNNs已经被在实践中证明对NLP是非常成功的。如词向量表达、
**3语音识别(Speech Recognition)**
语音识别是指给一段声波的声音信号,预测该声波对应的某种指定源语言语句以及该语句的概率值。
语音识别是指给一段声波的声音信号,预测该声波对应的某种指定源语言语句以及计算该语句的概率值。
**4图像描述生成 (Generating Image Descriptions)**
卷积神经网络(convolutional Neural Networks, CNNs)一样RNNs已经在对无标图像描述自动生成中得到应用。将CNNs与RNNs结合进行图像描述自动生成。
卷积神经网络(convolutional Neural Networks, CNNs)一样RNNs已经在对无标图像描述自动生成中得到应用。CNNs与RNNs结合也被应用于图像描述自动生成。
![](./img/ch6/figure_6.4_1.png)
## 6.5 RNNs训练和传统ANN训练异同点
**相同点**同样使用BP误差反向传播算法。
**相同点**
1. RNNs与传统ANN都使用BPBack Propagation误差反向传播算法。
**不同点**
1. 如果将RNNs进行网络展开那么参数W,U,V是共享的而传统神经网络却不是的
2. 在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,并且还以来前面若干步网络的状态。
1. RNNs网络参数W,U,V是共享的而传统神经网络各层参数间没有直接联系
2. 对于RNNs在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖于之前若干步的网络状态。
## 6.6 常见的RNNs扩展和改进模型
@ -74,35 +84,42 @@ RNNs已经被在实践中证明对NLP是非常成功的。如词向量表达、
### 6.6.1 Simple RNNs(SRNs)
SRNs是RNNs的一种特例它是一个三层网络并且在隐藏层增加了上下文单元下图中的**y**便是隐藏层,**u**便是上下文单元。上下文单元节点与隐藏层中的节点的连接是固定(谁与谁连接)的,并且权值也是固定的(值是多少)其实是一个上下文节点与隐藏层节点一一对应并且值是确定的。在每一步中使用标准的前向反馈进行传播然后使用学习算法进行学习。上下文每一个节点保存其连接的隐藏层节点的上一步的输出即保存上文并作用于当前步对应的隐藏层节点的状态即隐藏层的输入由输入层的输出与上一步的自己的状态所决定的。因此SRNs能够解决标准的多层感知机(MLP)无法解决的对序列数据进行预测的任务。
1. SRNs是RNNs的一种特例它是一个三层网络其在隐藏层增加了上下文单元。下图中的**y**是隐藏层,**u**是上下文单元。上下文单元节点与隐藏层中节点的连接是固定的,并且权值也是固定的。上下文节点与隐藏层节点一一对应,并且值是确定的。
2. 在每一步中使用标准的前向反馈进行传播然后使用学习算法进行学习。上下文每一个节点保存其连接隐藏层节点上一步输出即保存上文并作用于当前步对应的隐藏层节点状态即隐藏层的输入由输入层的输出与上一步的自身状态所决定。因此SRNs能够解决标准多层感知机(MLP)无法解决的对序列数据进行预测的问题。
**SRNs网络结构如下图所示**
![](./img/ch6/figure_6.6.1_1.png)
### 6.6.2 Bidirectional RNNs
Bidirectional RNNs(双向网络)的改进之处便是,假设当前的输出(第t步的输出)不仅仅与前面的序列有关,并且还与后面的序列有关。例如:预测一个语句中缺失的词语那么就需要根据上下文来进行预测。Bidirectional RNNs是一个相对较简单的RNNs是由两个RNNs上下叠加在一起组成的。输出由这两个RNNs的隐藏层的状态决定的。**如下图所示**
Bidirectional RNNs(双向网络)将两层RNNs叠加在一起当前时刻输出(第t步的输出)不仅仅与之前序列有关,还与之后序列有关。例如:为了预测一个语句中的缺失词语,就需要该词汇的上下文信息。Bidirectional RNNs是一个相对较简单的RNNs是由两个RNNs上下叠加在一起组成的。输出由前向RNNs和后向RNNs共同决定。**如下图所示**
![](./img/ch6/figure_6.6.2_1.png)
### 6.6.3Deep(Bidirectional) RNNs
### 6.6.3 Deep RNNs
Deep(Bidirectional)RNNs与Bidirectional RNNs相似只是对于每一步的输入有多层网络。这样该网络便有更强大的表达与学习能力但是复杂性也提高了同时需要更多的训练数据。**Deep(Bidirectional)RNNs的结构如下图所示**
Deep RNNs与Bidirectional RNNs相似其也是又多层RNNs叠加因此每一步的输入有了多层网络。该网络具有更强大的表达与学习能力但是复杂性也随之提高同时需要更多的训练数据。**Deep RNNs的结构如下图所示**
![](./img/ch6/figure_6.6.3_1.png)
### 6.6.4 Echo State NetworksESNs
ESNs(回声状态网络)虽然也是一种RNNs是它与传统的RNNs相差很大。
ESNs(回声状态网络)虽然也是一种RNNs它与传统的RNNs相差较大。
**ESNs具有三个特点**
1它的核心结构时一个随机生成、且保持不变的储备池(Reservoir),储备池是大规模的、随机生成的、稀疏连接(SD通常保持1%5%SD表示储备池中互相连接的神经元占总神经元个数N的比例)的循环结构;
1. 它的核心结构为一个随机生成、且保持不变的储备池(Reservoir)。储备池是大规模随机生成稀疏连接(SD通常保持1%5%SD表示储备池中互相连接的神经元占总神经元个数N的比例)的循环结构;
2储备池到输出层的权值矩阵是唯一需要调整的部分;
2. 从储备池到输出层的权值矩阵是唯一需要调整的部分;
3简单的线性回归就可完成网络的训练。
3. 简单的线性回归便能够完成网络训练;
从结构上讲ESNs是一种特殊类型的循环神经网络其基本思想是使用大规模随机连接的循环网络取代经典神经网络中的中间层从而简化网络的训练过程。因此ESNs的关键是中间的储备池。网络中的参数包括W为储备池中节点的连接权值矩阵Win为输入层到储备池之间的连接权值矩阵表明储备池中的神经元之间是连接的Wback为输出层到储备池之间的反馈连接权值矩阵表明储备池会有输出层来的反馈Wout为输入层、储备池、输出层到输出层的连接权值矩阵表明输出层不仅与储备池连接还与输入层和自己连接。Woutbias表示输出层的偏置项。
对于ESNs关键是储备池的四个参数如储备池内部连接权谱半径SR(SR=λmax=max{|W的特征指|}只有SR <1时ESNs才能具有回声状态属性)储备池规模N(即储备池中神经元的个数)储备池输入单元尺度IS(IS为储备池的输入信号连接到储备池内部神经元之前需要相乘的一个尺度因子)储备池稀疏程度SD(即为储备池中互相连接的神经元个数占储备池神经元总个数的比例)对于IS如果需要处理的任务的非线性越强那么输入单元尺度越大该原则的本质就是通过输入单元尺度IS将输入变换到神经元激活函数相应的范围(神经元激活函数的不同输入范围其非线性程度不同)
从结构上讲ESNs是一种特殊类型的循环神经网络其基本思想是使用大规模随机连接的循环网络取代经典神经网络中的中间层从而简化网络的训练过程。因此ESNs的关键是储备池。
网络中的参数包括:
1W - 储备池中节点间连接权值矩阵;
2Win - 输入层到储备池之间连接权值矩阵,表明储备池中的神经元之间是相互连接;
3Wback - 输出层到储备池之间的反馈连接权值矩阵,表明储备池会有输出层来的反馈;
4Wout - 输入层、储备池、输出层到输出层的连接权值矩阵,表明输出层不仅与储备池连接,还与输入层和自己连接。
Woutbias - 输出层的偏置项。
对于ESNs关键是储备池的四个参数如储备池内部连接权谱半径SR(SR=λmax=max{|W的特征指|}只有SR <1时ESNs才能具有回声状态属性)储备池规模N(即储备池中神经元的个数)储备池输入单元尺度IS(IS为储备池的输入信号连接到储备池内部神经元之前需要相乘的一个尺度因子)储备池稀疏程度SD(即为储备池中互相连接的神经元个数占储备池神经元总个数的比例)对于IS待处理任务的非线性越强输入单元尺度越大该原则本质就是通过输入单元尺度IS将输入变换到神经元激活函数相应的范围(神经元激活函数的不同输入范围其非线性程度不同)
**ESNs的结构如下图所示**
![](./img/ch6/figure_6.6.4_1.png)
@ -110,16 +127,19 @@ ESNs(回声状态网络)虽然也是一种RNNs但是它与传统的RNNs相差
![](./img/ch6/figure_6.6.4_3.png)
### 6.6.5 Gated Recurrent Unit Recurrent Neural Networks
GRUs也是一般的RNNs的改良版本主要是从以下**两个方面**进行改进。
GRUs是一般的RNNs的变型版本主要是从以下**两个方面**进行改进。
**一是**,序列中不同的位置处的单词(已单词举例)对当前的隐藏层的状态的影响不同,越前面的影响越小,即每个前状态对当前的影响进行了距离加权,距离越远,权值越小。
1. 序列中不同单词处(以语句为例)的数据对当前隐藏层状态的影响不同,越前面的影响越小,即每个前状态对当前的影响进行了距离加权,距离越远,权值越小。
**二是**在产生误差error时误差可能是由某一个或者几个单词而引发的所以应当仅仅对对应的单词weight进行更新。GRUs的结构如下图所示。GRUs首先根据当前输入单词向量word vector已经前一个隐藏层的状态hidden state计算出update gate和reset gate。再根据reset gate、当前word vector以及前一个hidden state计算新的记忆单元内容(new memory content)。当reset gate为1的时候new memory content忽略之前的所有memory content最终的memory是之前的hidden state与new memory content的结合
2. 在产生误差error时其可能是由之前某一个或者几个单词共同造成所以应当对对应的单词weight进行更新。GRUs的结构如下图所示。GRUs首先根据当前输入单词向量word vector以及前一个隐藏层状态hidden state计算出update gate和reset gate。再根据reset gate、当前word vector以及前一个hidden state计算新的记忆单元内容(new memory content)。当reset gate为1的时候new memory content忽略之前所有memory content最终的memory是由之前的hidden state与new memory content一起决定
![](./img/ch6/figure_6.6.5_1.png)
### 6.6.6 LSTM Netwoorks
LSTMs与GRUs类似目前非常流行。它与一般的RNNs结构本质上并没有什么不同只是使用了不同的函数去去计算隐藏层的状态。在LSTMs中i结构被称为cells可以把cells看作是黑盒用以保存当前输入xt之前的保存的状态ht1这些cells更加一定的条件决定哪些cell抑制哪些cell兴奋。它们结合前面的状态、当前的记忆与当前的输入。已经证明该网络结构在对长序列依赖问题中非常有效。**LSTMs的网络结构如下图所示**。
1. LSTMs是当前一种非常流行的深度学习模型。为了解决RNNs存在的长时记忆问题LSTMs利用了之前更多步的训练信息。
2. LSTMs与一般的RNNs结构本质上并没有太大区别只是使用了不同函数控制隐藏层的状态。
3. 在LSTMs中基本结构被称为cell可以把cell看作是黑盒用以保存当前输入之前Xt的隐藏层状态ht1。
4. LSTMs有三种类型的门遗忘门forget gate, 输入门input gate以及输出门output gate。遗忘门forget gate是用来决定 哪个cells的状态将被丢弃掉。输入门input gate决定哪些cells会被更新. 输出门output gate控制了结果输出. 因此当前输出依赖于cells状态以及门的过滤条件。实践证明LSTMs可以有效地解决长序列依赖问题。**LSTMs的网络结构如下图所示**。
![](./img/ch6/figure_6.6.6_1.png)
@ -127,20 +147,31 @@ LSTMs与GRUs的区别如图所示
![](./img/ch6/figure_6.6.6_2.png)
从上图可以看出,它们之间非常相像**不同在于**
从上图可以看出,二者结构十分相似**不同在于**
1new memory的计算方法都是根据之前的state及input进行计算但是GRUs中有一个reset gate控制之前state的进入量而在LSTMs里没有这个gate
1. new memory都是根据之前state及input进行计算但是GRUs中有一个reset gate控制之前state的进入量而在LSTMs里没有类似gate
2产生新的state的方式不同LSTMs有两个不同的gate分别是forget gate (f gate)和input gate(i gate)而GRUs只有一update gate(z gate)
2. 产生新的state的方式不同LSTMs有两个不同的gate分别是forget gate (f gate)和input gate(i gate)而GRUs只有一update gate(z gate)
3LSTMs对新产生的state又一个output gate(o gate)可以调节大小而GRUs直接输出无任何调节。
3. LSTMs对新产生的state可以通过output gate(o gate)进行调节而GRUs对输出无任何调节。
### 6.6.7Clockwork RNNs(CW-RNNs)
CW-RNNs是较新的一种RNNs模型其论文发表于2014年Beijing ICML
CW-RNNs也是一个RNNs的改良版本是一种使用时钟频率来驱动的RNNs。它将隐藏层分为几个块(组Group/Module)每一组按照自己规定的时钟频率对输入进行处理。并且为了降低标准的RNNs的复杂性CW-RNNs减少了参数的数目提高了网络性能加速了网络的训练。CW-RNNs通过不同的隐藏层模块工作在不同的时钟频率下来解决长时间依赖问题。将时钟时间进行离散化然后在不同的时间点不同的隐藏层组在工作。因此所有的隐藏层组在每一步不会都同时工作这样便会加快网络的训练。并且时钟周期小的组的神经元的不会连接到时钟周期大的组的神经元只会周期大的连接到周期小的(认为组与组之间的连接是有向的就好了,代表信息的传递是有向的),周期大的速度慢,周期小的速度快,那么便是速度慢的连速度快的,反之则不成立。现在还不明白不要紧,下面会进行讲解
### 6.6.7 Bidirectional LSTMs
1. 与bidirectional RNNs 类似bidirectional LSTMs有两层LSTMs。一层处理过去的训练信息另一层处理将来的训练信息
2. 在bidirectional LSTMs中通过前向LSTMs获得前向隐藏状态后向LSTMs获得后向隐藏状态当前隐藏状态是前向隐藏状态与后向隐藏状态的组合
CW-RNNs与SRNs网络结构类似也包括输入层(Input)、隐藏层(Hidden)、输出层(Output)它们之间也有向前连接输入层到隐藏层的连接隐藏层到输出层的连接。但是与SRN不同的是隐藏层中的神经元会被划分为若干个组设为$g$,每一组中的神经元个数相同,设为$k$,并为每一个组分配一个时钟周期$T_i\epsilon\{T_1,T_2,...,T_g\}$,每一个组中的所有神经元都是全连接,但是组$j$到组$i$的循环连接则需要满足$T_j$大于$T_i$。如下图所示,将这些组按照时钟周期递增从左到右进行排序,即$T_1<T_2<...<T_g$那么连接便是从右到左例如隐藏层共有256个节点分为四组周期分别是[1,2,4,8]那么每个隐藏层组256/4=64个节点第一组隐藏层与隐藏层的连接矩阵为64$\times$64的矩阵第二层的矩阵则为64$\times$128矩阵第三组为64$\times$(3$\times$64)=64$\times$192矩阵第四组为64$\times$(4$\times$64)=64$\times$256矩阵。这就解释了上一段的后面部分速度慢的组连到速度快的组反之则不成立。
### 6.6.8 Stacked LSTMs
1. 与deep rnns 类似stacked LSTMs 通过将多层LSTMs叠加起来得到一个更加复杂的模型。
2. 不同于bidirectional LSTMsstacked LSTMs只利用之前步骤的训练信息。
### 6.6.9 Clockwork RNNs(CW-RNNs)
CW-RNNs是较新的一种RNNs模型该模型首次发表于2014年Beijing ICML。
CW-RNNs是RNNs的改良版本其使用时钟频率来驱动。它将隐藏层分为几个块(组Group/Module)每一组按照自己规定的时钟频率对输入进行处理。为了降低RNNs的复杂度CW-RNNs减少了参数数量并且提高了网络性能加速网络训练。CW-RNNs通过不同隐藏层模块在不同时钟频率下工作来解决长时依赖问题。将时钟时间进行离散化不同的隐藏层组将在不同时刻进行工作。因此所有的隐藏层组在每一步不会全部同时工作这样便会加快网络的训练。并且时钟周期小组的神经元不会连接到时钟周期大组的神经元只允许周期大的神经元连接到周期小的(组与组之间的连接以及信息传递是有向的)。周期大的速度慢,周期小的速度快,因此是速度慢的神经元连速度快的神经元,反之则不成立。
CW-RNNs与SRNs网络结构类似也包括输入层(Input)、隐藏层(Hidden)、输出层(Output)它们之间存在前向连接输入层到隐藏层连接隐藏层到输出层连接。但是与SRN不同的是隐藏层中的神经元会被划分为若干个组设为$g$,每一组中的神经元个数相同,设为$k$,并为每一个组分配一个时钟周期$T_i\epsilon\{T_1,T_2,...,T_g\}$,每一组中的所有神经元都是全连接,但是组$j$到组$i$的循环连接则需要满足$T_j$大于$T_i$。如下图所示,将这些组按照时钟周期递增从左到右进行排序,即$T_1<T_2<...<T_g$那么连接便是从右到左例如隐藏层共有256个节点分为四组周期分别是[1,2,4,8]那么每个隐藏层组256/4=64个节点第一组隐藏层与隐藏层的连接矩阵为64$\times$64的矩阵第二层的矩阵则为64$\times$128矩阵第三组为64$\times$(3$\times$64)=64$\times$192矩阵第四组为64$\times$(4$\times$64)=64$\times$256矩阵。这就解释了上一段中速度慢的组连接到速度快的组反之则不成立。
**CW-RNNs的网络结构如下图所示**
![](./img/ch6/figure_6.6.7_1.png)
### 6.6.10 CNN-LSTMs
1. 为了同时利用CNN以及LSTMs的优点CNN-LSTMs被提出。在该模型中CNN用于提取对象特征LSTMs用于预测。CNN由于卷积特性其能够快速而且准确地捕捉对象特征。LSTMs的优点在于能够捕捉数据间的长时依赖性。

View File

@ -0,0 +1,38 @@
# 什么是生成对抗网络
## GAN的通俗化介绍
生成对抗网络(GAN, Generative adversarial network)自从2014年被Ian Goodfellow提出以来掀起来了一股研究热潮。GAN由生成器和判别器组成生成器负责生成样本判别器负责判断生成器生成的样本是否为真。生成器要尽可能迷惑判别器而判别器要尽可能区分生成器生成的样本和真实样本。
在GAN的原作[1]中,作者将生成器比喻为印假钞票的犯罪分子,判别器则类比为警察。犯罪分子努力让钞票看起来逼真,警察则不断提升对于假钞的辨识能力。二者互相博弈,随着时间的进行,都会越来越强。
# GAN的形式化表达
上述例子只是简要介绍了一下GAN的思想下面对于GAN做一个形式化的更加具体的定义。通常情况下无论是生成器还是判别器我们都可以用神经网络来实现。那么我们可以把通俗化的定义用下面这个模型来表示
![GAN网络结构](/images/7.1-gan_structure.png)
上述模型左边是生成器G其输入是$$z$$对于原始的GAN$$z$$是由高斯分布随机采样得到的噪声。噪声$$z$$通过生成器得到了生成的假样本。
生成的假样本与真实样本放到一起被随机抽取送入到判别器D由判别器去区分输入的样本是生成的假样本还是真实的样本。整个过程简单明了生成对抗网络中的“生成对抗”主要体现在生成器和判别器之间的对抗。
# GAN的目标函数
对于上述神经网络模型如果想要学习其参数首先需要一个目标函数。GAN的目标函数定义如下
$$\mathop {\min }\limits_G \mathop {\max }\limits_D V(D,G) = {{\rm E}_{x\sim{p_{data}}(x)}}[\log D(x)] + {{\rm E}_{z\sim{p_z}(z)}}[\log (1 - D(G(z)))]$$
这个目标函数可以分为两个部分来理解:
判别器的优化通过$$\mathop {\max}\limits_D V(D,G)$$实现,$$V(D,G)$$为判别器的目标函数,其第一项$${{\rm E}_{x\sim{p_{data}}(x)}}[\log D(x)]$$表示对于从真实数据分布 中采用的样本 ,其被判别器判定为真实样本概率的数学期望。对于真实数据分布 中采样的样本其预测为正样本的概率当然是越接近1越好。因此希望最大化这一项。第二项$${{\rm E}_{z\sim{p_z}(z)}}[\log (1 - D(G(z)))]$$表示对于从噪声P_z(z)分布当中采样得到的样本经过生成器生成之后得到的生成图片,然后送入判别器,其预测概率的负对数的期望,这个值自然是越大越好,这个值越大, 越接近0也就代表判别器越好。
生成器的优化通过$$\mathop {\min }\limits_G({\mathop {\max }\limits_D V(D,G)})$$实现。注意,生成器的目标不是$$\mathop {\min }\limits_GV(D,G)$$,即生成器**不是最小化判别器的目标函数**,生成器最小化的是**判别器目标函数的最大值**判别器目标函数的最大值代表的是真实数据分布与生成数据分布的JS散度(详情可以参阅附录的推导)JS散度可以度量分布的相似性两个分布越接近JS散度越小。
# GAN的目标函数和交叉熵
判别器目标函数写成离散形式即为$$V(D,G)=-\frac{1}{m}\sum_{i=1}^{i=m}logD(x^i)-\frac{1}{m}\sum_{i=1}^{i=m}log(1-D(\tilde{x}^i))$$
可以看出,这个目标函数和交叉熵是一致的,即**判别器的目标是最小化交叉熵损失生成器的目标是最小化生成数据分布和真实数据分布的JS散度**
-------------------
[1]: Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
[2]

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

View File

@ -1,15 +1,19 @@
# 第七章 生成对抗网络(GAN)
# 第七章_生成对抗网络(GAN)
## GAN生成概率的本质是什么?
## GAN的「生成」的本质是什么?
GAN的形式是两个网络GGenerator和DDiscriminator。Generator是一个生成图片的网络它接收一个随机的噪声z记做G(z)。Discriminator是一个判别网络判别一张图片是不是“真实的”。它的输入是xx代表一张图片输出Dx代表x为真实图片的概率如果为1就代表100%是真实的图片而输出为0就代表不可能是真实的图片。
GAN*生成*能力是学习*分布*引入的latent variable的noise使习得的概率分布进行偏移。因此在训练GAN的时候latent variable**不能**引入均匀分布uniform distribution),因为均匀分布的数据的引入并不会改变概率分布。
GAN*生成*能力是*学习分布*引入的latent variable的noise使习得的概率分布进行偏移。因此在训练GAN的时候latent variable**不能**引入均匀分布uniform distribution),因为均匀分布的数据的引入并不会改变概率分布。
## GAN能做数据增广吗
GAN能够从一个模型引入一个随机数之后「生成」无限的output用GAN来做数据增广似乎很有吸引力并且是一个极清晰的一个insight。然而纵观整个GAN的训练过程Generator习得分布再引入一个Distribution(Gaussian或其他)的噪声以「骗过」Discriminator并且无论是KL Divergence或是Wasserstein Divergence本质还是信息衡量的手段在本章中其余部分介绍能「骗过」Discriminator的Generator一定是能在引入一个Distribution的噪声的情况下最好的结合已有信息。
训练好的GAN应该能够很好的使用已有的数据的信息特征或分布现在问题来了这些信息本来就包含在数据里面有必要把信息丢到Generator学习使得的结果加上噪声作为训练模型的输入吗
## VAE与GAN有什么不同
1. VAE可以直接用在离散型数据。
2. VAE整个训练流程只靠一个假设的loss函数和KL Divergence逼近真实分布。GAN没有假设单个loss函数, 而是让判别器D和生成器G互相博弈以期得到Nash Equilibrium。
## 有哪些优秀的GAN
### DCGAN
@ -22,7 +26,7 @@ WGAN及其延伸是被讨论的最多的部分原文连发两文第一篇(
**KL/JS Divergence为什么不好用Wasserstein Divergence牛逼在哪里**
**KL Divergence**是两个概率分布P和Q差别的**非对称性**的度量。KL Divergence是用来度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数。 而**JS Divergence**是KL Divergence的升级版解决的是**对称性**的问题。即JS Divergence是对称的。
**KL Divergence**是两个概率分布P和Q差别的**非对称性**的度量。KL Divergence是用来度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数(即分布的平移量)。 而**JS Divergence**是KL Divergence的升级版解决的是**对称性**的问题。即JS Divergence是对称的。并且由于KL Divergence不具有很好的对称性将KL Divergence考虑成距离可能是站不住脚的并且可以由KL Divergence的公式中看出来平移量$\to 0$的时候KL Divergence直接炸了。
KL Divergence:
$$D_{KL}(P||Q)=-\sum_{x\in X}P(x) log\frac{1}{P(x)}+\sum_{x\in X}p(x)log\frac{1}{Q(x)}=\sum_{x\in X}p(x)log\frac{P(x)}{Q(x)}$$
@ -40,9 +44,14 @@ $$JS(P_1||P_2)=\frac{1}{2}KL(P_1||\frac{P_1+P_2}{2})$$
3. 对更新后的权重强制截断到一定范围内,比如[-0.010.01]以满足lipschitz连续性条件。
4. 论文中也推荐使用SGDRMSprop等优化器不要基于使用动量的优化算法比如adam。
然而,就实际而言,最优的选择其实应该是
然而,由于D和G其实是各自有一个loss的G和D是可以**用不同的优化器**的。个人认为Best Practice是G用SGD或RMSprop而D用Adam。
**如何理解Wass距离**
很期待未来有专门针对寻找均衡态的优化方法。
**WGAN-GP的改进有哪些**
**如何理解Wasserstein距离**
Wasserstein距离与optimal transport有一些关系并且从数学上想很好的理解需要一定的测度论的知识。
### condition GAN
@ -58,6 +67,9 @@ $$L^{infoGAN}_{G}=L^{GAN}_G-\lambda L_1(c,c')$$
### StarGAN
目前Image-to-Image Translation做的最好的GAN。
## Self-Attention GAN
## GAN训练有什么难点
由于GAN的收敛要求**两个网络D&G同时达到一个均衡**
@ -71,6 +83,8 @@ GAN是一种半监督学习模型对训练集不需要太多有标签的数
Instance Norm比Batch Norm的效果要更好。
使用逆卷积来生成图片会比用全连接层效果好,全连接层会有较多的噪点,逆卷积层效果清晰。
## GAN如何解决NLP问题
GAN只适用于连续型数据的生成对于离散型数据效果不佳因此假如NLP方法直接应用的是character-wise的方案Gradient based的GAN是无法将梯度Back propagationBP给生成网络的因此从训练结果上看GAN中G的表现长期被D压着打。

View File

@ -28,6 +28,12 @@ modify_log---->用来记录修改日志
4. 修改或者删除部分9.9.3公式、部分说法(可讨论),增加论文链接
5. 修改或者删除部分9.9.4公式、部分说法(可讨论),增加论文链接
<----qjhuang-2018-11-7---->
1. 修改9.5答案部分说法(可讨论)
<----qjhuang-2018-11-9---->
1. 修改部分答案公式,链接
其他---->待增加
2. 修改readme内容
3. 修改modify内容

View File

@ -219,9 +219,7 @@ learning rate0.001。
&emsp;&emsp;
(2) 左边的网络是收缩路径使用卷积和maxpooling。
&emsp;&emsp;
(3) 右边的网络是扩张路径:使用上采样产生的特征图与左侧收缩路径对应层产生的特征图进行concatenate操作。pooling层会丢失图像信息和降低图像分辨率且是不可逆的操作对图像分割任务有一些影响对图像分类任务的影响不大为什么要做上采样
&emsp;&emsp;
因为上采样可以补足一些图片的信息,但是信息补充的肯定不完全,所以还需要与左边的分辨率比较高的图片相连接起来(直接复制过来再裁剪到与上采样图片一样大小),这就相当于在高分辨率和更抽象特征当中做一个折衷,因为随着卷积次数增多,提取的特征也更加有效,更加抽象,上采样的图片是经历多次卷积后的图片,肯定是比较高效和抽象的图片,然后把它与左边不怎么抽象但更高分辨率的特征图片进行连接)。
(3) 右边的网络是扩张路径:使用上采样产生的特征图与左侧收缩路径对应层产生的特征图进行concatenate操作。pooling层会丢失图像信息和降低图像分辨率且是不可逆的操作对图像分割任务有一些影响对图像分类任务的影响不大为什么要做上采样因为上采样可以补足一些图片的信息但是信息补充的肯定不完全所以还需要与左边的分辨率比较高的图片相连接起来直接复制过来再裁剪到与上采样图片一样大小这就相当于在高分辨率和更抽象特征当中做一个折衷因为随着卷积次数增多提取的特征也更加有效更加抽象上采样的图片是经历多次卷积后的图片肯定是比较高效和抽象的图片然后把它与左边不怎么抽象但更高分辨率的特征图片进行连接
&emsp;&emsp;
(4) 最后再经过两次反卷积操作生成特征图再用两个1X1的卷积做分类得到最后的两张heatmap,例如第一张表示的是第一类的得分第二张表示第二类的得分heatmap,然后作为softmax函数的输入算出概率比较大的softmax类选择它作为输入给交叉熵进行反向传播训练。
@ -369,10 +367,13 @@ RefineNet block的作用就是把不同resolution level的feature map进行融
&emsp;&emsp;
Residual convolution unit就是普通的去除了BN的residual unit
&emsp;&emsp;
Multi-resolution fusion是先对多输入的feature map都用一个卷积层进行adaptation(都化到最小的feature map的shape)再上采样再做element-wise的相加。注意如果是像RefineNet-4那样的单输入block这一部分就直接pass了
&emsp;&emsp;
Chained residual pooling 没太看懂怎么残差相加的(其中的ReLU非常重要),之后再修改;
Chained residual pooling中的ReLU对接下来池化的有效性很重要还可以使模型对学习率的变化没这么敏感。这个链式结构能从很大范围区域上获取背景context。另外这个结构中大量使用了identity mapping这样的连接无论长距离或者短距离的这样的结构允许梯度从一个block直接向其他任一block传播。
&emsp;&emsp;
Output convolutions就是输出前再加一个RCU。
@ -577,7 +578,8 @@ Why K个mask通过对每个 Class 对应一个Mask可以有效避免类间竞
## **9.9 CNN在基于弱监督学习的图像分割中的应用**
&emsp;&emsp;
https://zhuanlan.zhihu.com/p/23811946
答案来源:[CNN在基于弱监督学习的图像分割中的应用](https://zhuanlan.zhihu.com/p/23811946)
&emsp;&emsp;
最近基于深度学习的图像分割技术一般依赖于卷积神经网络CNN的训练训练过程中需要非常大量的标记图像即一般要求训练图像中都要有精确的分割结果。
&emsp;&emsp;
@ -585,11 +587,11 @@ https://zhuanlan.zhihu.com/p/23811946
&emsp;&emsp;
如果学习算法能通过对一些初略标记过的数据集的学习就能完成好的分割结果,那么对训练数据的标记过程就很简单,这可以大大降低花在训练数据标记上的时间。这些初略标记可以是:
&emsp;&emsp;
1 只给出一张图像里面包含哪些物体,
1只给出一张图像里面包含哪些物体,
&emsp;&emsp;
2 给出某个物体的边界框,
2给出某个物体的边界框,
&emsp;&emsp;
3 对图像中的物体区域做部分像素的标记例如画一些线条、涂鸦等scribbles)。
3对图像中的物体区域做部分像素的标记例如画一些线条、涂鸦等scribbles)。
**9.9.1 Scribble标记**
@ -601,6 +603,7 @@ https://zhuanlan.zhihu.com/p/23811946
&emsp;&emsp;
ScribbleSup分为两步第一步将像素的类别信息从scribbles传播到其他未标记的像素自动完成所有的训练图像的标记工作 第二步使用这些标记图像训练CNN。在第一步中该方法先生成super-pxels, 然后基于graph cut的方法对所有的super-pixel进行标记。
<center><img src="./img/ch9/figure_9.9_2.png"></center>
&emsp;&emsp;
Graph Cut的能量函数为
@ -629,9 +632,9 @@ UC Berkeley的Deepak Pathak使用了一个具有图像级别标记的训练数
&emsp;&emsp;
该方法把训练过程看作是有线性限制条件的最优化过程:
$$
\underset{\theta ,P}{minimize}\qquad D(P(X)||Q(X|\theta ))\\
subject\to\qquad A\overrightarrow{P} \geqslant \overrightarrow{b},\sum_{X}^{ }P(X)=1
$$
&emsp;&emsp;
@ -666,7 +669,7 @@ $$
论文地址:[Learning to Segment Under Various Forms of Weak Supervision (CVPR 2015)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Xu_Learning_to_Segment_2015_CVPR_paper.pdf)
&emsp;&emsp;
Wisconsin-Madison大学的Jia Xu提出了一个统一的框架来处理各种不同类型的弱标记图像级别的标记、bounding box和部分像素标记如scribbles。该方法把所有的训练图像分成共计 个super-pixel对每个super-pixel提取一个 维特征向量。因为不知道每个super-pixel所属的类别相当于无监督学习因此该方法对所有的super-pixel做聚类使用的是最大间隔聚类方法(max-margin clustering, MMC),该过程的最优化目标函数是:
Wisconsin-Madison大学的Jia Xu提出了一个统一的框架来处理各种不同类型的弱标记图像级别的标记、bounding box和部分像素标记如scribbles。该方法把所有的训练图像分成共计$n$个super-pixel对每个super-pixel提取一个$d$维特征向量。因为不知道每个super-pixel所属的类别相当于无监督学习因此该方法对所有的super-pixel做聚类使用的是最大间隔聚类方法(max-margin clustering, MMC),该过程的最优化目标函数是:
$$
\underset{W,H}{min} \qquad \frac{1}{2}tr\left ( W^TW \right ) + \lambda\sum_{p=1}^{n}\sum_{c=1}^{C}\xi \left ( w_c;x_p;h_p^c \right)