diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..f288702
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<https://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<https://www.gnu.org/licenses/why-not-lgpl.html>.
diff --git a/README.md b/README.md
index ee0d3a7..b585d4d 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # 1. 版权声明  
-请尊重作者的知识产权，版权所有，翻版必究。        
-请大家一起维护自己的劳动成果，进行监督。   
+请尊重作者的知识产权，版权所有，翻版必究。   未经许可，严禁转发内容！     
+请大家一起维护自己的劳动成果，进行监督。  未经许可， 严禁转发内容！ 　　　
 2018.6.27 TanJiyong
 
 # 2. 概述
@@ -63,7 +63,7 @@
 1. 寻求有愿意继续完善的朋友、编辑、写手; 如有意合作，完善出书（成为共同作者）。    
   所有提交内容的贡献者，将会在文中体现贡献者个人信息（大佬-西湖大学）。   
 
-2. 联系方式 : 请联系scutjy2015@163.com　(唯一官方邮箱)；　微信Tan：  tan_weixin88     
+2. 联系方式 : 请联系scutjy2015@163.com　(唯一官方邮箱)；　微信Tan：      
 
    (进群先在MD版本增加、改善、提交内容后，更易进群，享受分享知识帮助他人。)
 
diff --git a/ch11_迁移学习/第十一章_迁移学习.md b/ch11_迁移学习/第十一章_迁移学习.md
index 830f922..e41005b 100644
--- a/ch11_迁移学习/第十一章_迁移学习.md
+++ b/ch11_迁移学习/第十一章_迁移学习.md
@@ -107,3 +107,37 @@ Reference:
 1. 所以端到端深度学习的弊端之一是它把可能有用的人工设计的组件排除在外了，精心设计的人工组件可能非常有用，但它们也有可能真的伤害到你的算法表现。
 例如，强制你的算法以音位为单位思考，也许让算法自己找到更好的表示方法更好。
 所以这是一把双刃剑，可能有坏处，可能有好处，但往往好处更多，手工设计的组件往往在训练集更小的时候帮助更大。
+
+## 10.8 什么是负迁移？产生负迁移的原因有哪些？（中科院计算所-王晋东）
+
+负迁移(Negative Transfer)指的是，在源域上学习到的知识，对于目标域上的学习产生负面作用。
+
+产生负迁移的原因主要有：
+- 数据问题：源域和目标域压根不相似，谈何迁移？
+- 方法问题：源域和目标域是相似的，但是，迁移学习方法不够好，没找到可迁移的成分。
+
+负迁移给迁移学习的研究和应用带来了负面影响。在实际应用中，找到合理的相似性，并且选择或开发合理的迁移学习方法，能够避免负迁移现象。
+
+## 10.9 迁移学习的基本问题有哪些？（中科院计算所-王晋东）
+
+基本问题主要有3个：
+- How to transfer: 如何进行迁移学习？（设计迁移方法）
+- What to transfer: 给定一个目标领域，如何找到相对应的源领域，然后进行迁移？（源领域选择）
+- When to transfer: 什么时候可以进行迁移，什么时候不可以？（避免负迁移）
+
+## 10.10 迁移学习的基本思路？（中科院计算所-王晋东）
+
+迁移学习的总体思路可以概括为：开发算法来最大限度地利用有标注的领域的知识，来辅助目标领域的知识获取和学习。
+
+迁移学习的核心是，找到源领域和目标领域之间的相似性，并加以合理利用。这种相似性非常普遍。比如，不同人的身体构造是相似的；自行车和摩托车的骑行方式是相似的；国际象棋和中国象棋是相似的；羽毛球和网球的打球方式是相似的。这种相似性也可以理解为不变量。以不变应万变，才能立于不败之地。
+
+有了这种相似性后，下一步工作就是， 如何度量和利用这种相似性。度量工作的目标有两点：一是很好地度量两个领域的相似性，不仅定性地告诉我们它们是否相似，更定量地给出相似程度。二是以度量为准则，通过我们所要采用的学习手段，增大两个领域之间的相似性，从而完成迁移学习。
+
+**一句话总结： 相似性是核心，度量准则是重要手段。**
+
+## 10.11 为什么需要迁移学习？（中科院计算所-王晋东）
+
+1. 大数据与少标注的矛盾：虽然有大量的数据，但往往都是没有标注的，无法训练机器学习模型。人工进行数据标定太耗时。
+2. 大数据与弱计算的矛盾：普通人无法拥有庞大的数据量与计算资源。因此需要借助于模型的迁移。
+3. 普适化模型与个性化需求的矛盾：即使是在同一个任务上，一个模型也往往难以满足每个人的个性化需求，比如特定的隐私设置。这就需要在不同人之间做模型的适配。
+4. 特定应用（如冷启动）的需求。
\ No newline at end of file
diff --git a/ch12_网络搭建及训练/第十二章_网络搭建及训练.md b/ch12_网络搭建及训练/第十二章_网络搭建及训练.md
index 7a60ce7..0cee9a6 100644
--- a/ch12_网络搭建及训练/第十二章_网络搭建及训练.md
+++ b/ch12_网络搭建及训练/第十二章_网络搭建及训练.md
@@ -1,24 +1,31 @@
-10.1 网络搭建有什么原则？
-10.1.1新手原则。
+# 第十二章 网络搭建及训练。
+## 10.1 网络搭建有什么原则？
+
+### 10.1.1新手原则。
+
 刚入门的新手不建议直接上来就开始搭建网络模型。比较建议的学习顺序如下：
 - 1.了解神经网络工作原理，熟悉基本概念及术语。
 - 2.阅读经典网络模型论文+实现源码(深度学习框架视自己情况而定)。
 - 3.找数据集动手跑一个网络，可以尝试更改已有的网络模型结构。
 - 4.根据自己的项目需要设计网络。
 
-10.1.2深度优先原则。
-通常增加网络深度可以提高准确率，但同时会牺牲一些速度和内存。
+### 10.1.2深度优先原则。
+
+通常增加网络深度可以提高准确率，但同时会牺牲一些速度和内存。但深度不是盲目堆起来的，一定要在浅层网络有一定效果的基础上，增加深度。深度增加是为了增加模型的准确率，如果浅层都学不到东西，深了也没效果。
+
+### 10.1.3卷积核size一般为奇数。
 
-10.1.3卷积核size一般为奇数。
 卷积核为奇数有以下好处：
 - 1 保证锚点刚好在中间，方便以 central pixel为标准进行滑动卷积，避免了位置信息发生偏移 。
 - 2 保证在填充（Padding）时，在图像之间添加额外的零层，图像的两边仍然对称。
 
-10.1.4卷积核不是越大越好。
+### 10.1.4卷积核不是越大越好。
+
 AlexNet中用到了一些非常大的卷积核，比如11×11、5×5卷积核，之前人们的观念是，卷积核越大，感受野越大，看到的图片信息越多，因此获得的特征越好。但是大的卷积核会导致计算量的暴增，不利于模型深度的增加，计算性能也会降低。于是在VGG、Inception网络中，利用2个3×3卷积核的组合比1个5×5卷积核的效果更佳，同时参数量（3×3×2+1=19<26=5×5×1+1）被降低，因此后来3×3卷积核被广泛应用在各种模型中。
 
 
-10.2 有哪些经典的网络模型值得我们去学习的？
+## 10.2 有哪些经典的网络模型值得我们去学习的？
+
 提起经典的网络模型就不得不提起计算机视觉领域的经典比赛：ILSVRC .其全称是 ImageNet Large Scale Visual Recognition Challenge.正是因为ILSVRC 2012挑战赛上的AlexNet横空出世，使得全球范围内掀起了一波深度学习热潮。这一年也被称作“深度学习元年”。而在历年ILSVRC比赛中每次刷新比赛记录的那些神经网络也成为了人们心中的经典，成为学术界与工业届竞相学习与复现的对象，并在此基础上展开新的研究。
 
 
@@ -40,7 +47,7 @@ AlexNet中用到了一些非常大的卷积核，比如11×11、5×5卷积核，
 
 >- 2 VGGNet
 论文:[Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
-代码实现:[tensorflow]https://github.com/tensorflow/tensorflow/blob/361a82d73a50a800510674b3aaa20e4845e56434/tensorflow/contrib/slim/python/slim/nets/vgg.py)
+代码实现:[tensorflow](https://github.com/tensorflow/tensorflow/blob/361a82d73a50a800510674b3aaa20e4845e56434/tensorflow/contrib/slim/python/slim/nets/vgg.py)
 主要特点：
 >> - 1.网络结构更深。
 >> - 2.普遍使用小卷积核。
@@ -64,35 +71,58 @@ AlexNet中用到了一些非常大的卷积核，比如11×11、5×5卷积核，
 >> 提出了feature recalibration，通过引入 attention 重新加权，可以得到抑制无效特征，提升有效特征的权重，并很容易地和现有网络结合，提升现有网络性能，而计算量不会增加太多。
 
 **CV领域网络结构演进历程：**
-![CV领域网络结构演进历程](http://wx2.sinaimg.cn/mw690/005B3ViFly1fwthh0jw58j30q80aldgw.jpg)
+![CV领域网络结构演进历程](./img/ch12/网络结构演进.png)
 
 **ILSVRC挑战赛历年冠军:**
-![ILSVRC挑战赛历年冠军](http://wx4.sinaimg.cn/mw690/005B3ViFly1fwswhzquw2j31810or78b.jpg)
+![ILSVRC挑战赛历年冠军](./img/ch12/历年冠军.png)
 
 
 此后，ILSVRC挑战赛的名次一直是衡量一个研究机构或企业技术水平的重要标尺。
 ILSVRC 2017 已是最后一届举办.2018年起，将由WebVision竞赛（Challenge on Visual Understanding by Learning from Web Data）来接棒。因此，即使ILSVRC挑战赛停办了，但其对深度学习的深远影响和巨大贡献，将永载史册。
 
-10.3 网络训练有哪些技巧吗？
-10.3.1.合适的数据集。
+## 10.3 网络训练有哪些技巧吗？
+
+### 10.3.1.合适的数据集。
+
  - 1 没有明显脏数据(可以极大避免Loss输出为NaN)。
  - 2 样本数据分布均匀。
 
-10.3.2.合适的预处理方法。
+### 10.3.2.合适的预处理方法。
+
 关于数据预处理，在Batch Normalization未出现之前预处理的主要做法是减去均值，然后除去方差。在Batch Normalization出现之后，减均值除方差的做法已经没有必要了。对应的预处理方法主要是数据筛查、数据增强等。
 
-10.3.3.网络的初始化。
+### 10.3.3.网络的初始化。
+
 网络初始化最粗暴的做法是参数赋值为全0，这是绝对不可取的。因为如果所有的参数都是0，那么所有神经元的输出都将是相同的，那在back propagation的时候同一层内所有神经元的行为也是相同的，这可能会直接导致模型失效，无法收敛。吴恩达视频中介绍的方法是将网络权重初始化均值为0、方差为1符合的正态分布的随机数据。
 
-10.3.4.小规模数据试练。
+### 10.3.4.小规模数据试练。
+
 在正式开始训练之前，可以先用小规模数据进行试练。原因如下：
   - 1 可以验证自己的训练流程对否。
   - 2 可以观察收敛速度，帮助调整学习速率。
   - 3 查看GPU显存占用情况，最大化batch_size(前提是进行了batch normalization，只要显卡不爆，尽量挑大的)。
 
-10.3.5.设置合理Learning Rate。
+### 10.3.5.设置合理Learning Rate。
+
 - 1 太大。Loss爆炸、输出NaN等。
 - 2 太小。收敛速度过慢，训练时长大大延长。
 - 3 可变的学习速率。比如当输出准确率到达某个阈值后，可以让Learning Rate减半继续训练。
 
+### 10.3.6.损失函数
+
+损失函数主要分为两大类:分类损失和回归损失
+>1.回归损失：
+>> - 1 均方误差(MSE 二次损失 L2损失)
+它是我们的目标变量与预测值变量差值平方。
+>> - 2 平均绝对误差(MAE L1损失)
+它是我们的目标变量与预测值变量差值绝对值。
+>关于MSE与MAE的比较。MSE更容易解决问题，但是MAE对于异常值更加鲁棒。更多关于MAE和MSE的性能，可以参考[L1vs.L2 Loss Function](https://rishy.github.io/ml/2015/07/28/l1-vs-l2-loss/)
+
+
+>2.分类损失：
+>> - 1 交叉熵损失函数。
+是目前神经网络中最常用的分类目标损失函数。
+>> - 2 合页损失函数
+>>合页损失函数广泛在支持向量机中使用，有时也会在损失函数中使用。缺点:合页损失函数是对错误越大的样本施以更严重的惩罚，但是这样会导致损失函数对噪声敏感。
+
 
diff --git a/ch15_GPU和框架选型/img/ch15/cuda8.0.png b/ch15_GPU和框架选型/img/ch15/cuda8.0.png
new file mode 100644
index 0000000..661c4e8
Binary files /dev/null and b/ch15_GPU和框架选型/img/ch15/cuda8.0.png differ
diff --git a/ch15_GPU和框架选型/img/ch15/cuda9.0.png b/ch15_GPU和框架选型/img/ch15/cuda9.0.png
new file mode 100644
index 0000000..cdc6206
Binary files /dev/null and b/ch15_GPU和框架选型/img/ch15/cuda9.0.png differ
diff --git a/ch15_GPU和框架选型/第十五章_异构运算、GPU及框架选型.md b/ch15_GPU和框架选型/第十五章_异构运算、GPU及框架选型.md
index fefbd09..c248391 100644
--- a/ch15_GPU和框架选型/第十五章_异构运算、GPU及框架选型.md
+++ b/ch15_GPU和框架选型/第十五章_异构运算、GPU及框架选型.md
@@ -77,19 +77,67 @@ Nvidia有面向个人用户（例如GTX系列）和企业用户（例如Tesla系
 Nvidia一般每一两年发布一次新版本的GPU，例如2017年发布的是GTX 1000系列。每个系列中会有数个不同的型号，分别对应不同的性能。
 
 ## 15.6 软件环境搭建
-深度学习其实就是指基于一套完整的软件系统来构建算法，训练模型，如何搭建一套完整的软件系统，比如操作系统的选择？安装环境中遇到的问题等等，本节做一个简单的总结。
+深度学习其实就是指基于一套完整的软件系统来构建算法，训练模型。如何搭建一套完整的软件系统，比如操作系统的选择？安装环境中遇到的问题等等，本节做一个简单的总结。
 
 ### 15.6.1 操作系统选择？
-针对硬件厂商来说，比如NVIDIA，对各个操作系统的支持都是比较好的 ，比如windows10,linux系列，但是由于linux系统对专业技术人员是非常友好的，所以目前几乎所有的深度学习系统构建都是基于linux的，比较常用的系统如ubuuntu系列，centos系列等等。
+针对硬件厂商来说，比如NVIDIA，对各个操作系统的支持都是比较好的 ，比如windows系列,linux系列，但是由于linux系统对专业技术人员比较友好，所以目前几乎所有的深度学习系统构建都是基于linux的，比较常用的系统如ubuuntu系列，centos系列等等。
 在构建系统的时候，如何选择合适的操作系是一个刚刚入门深度学习的工作者面临的问题，在这里给出几点建议：  
-（1）刚刚入门，熟悉windows系统，但是对linux和深度学习都不太熟，这个时候可以基于windows10等系统来做入门学习  
-（2）简单了解linux的使用，不太懂深度学习相关知识，可以直接基于linux系统来搭建框架，跑一些开源的项目，慢慢研究
-（3）熟悉linux，毫无疑问，强烈推荐使用linux系统，安装软件简单，工作效率高
+（1）刚刚入门，熟悉windows系统，但是对linux和深度学习都不太熟，这个时候可以基于windows系列系统来做入门学习  
+（2）简单了解linux的使用，不太懂深度学习相关知识，可以直接基于linux系统来搭建框架，跑一些开源的项目，慢慢深入研究学习  
+（3）熟悉linux，不熟悉深度学习理论，毫无疑问，强烈推荐使用linux系统，安装软件简单，工作效率高
 总之一句话，如果不熟悉linux，就先慢慢熟悉，最终还是要回归到linux系统来构建深度学习系统  
 
-### 15.6.2 本机安装还是使用docker？
+### 15.6.2 常用基础软件安装？
+目前有众多深度学习框架可供大家使用，但是所有框架基本都有一个共同的特点，目前几乎都是基于nvidia的gpu来训练模型，要想更好的使用nvidia的gpu，cuda和cudnn就是必备的软件安装。
+1，安装cuda
+上文中有关于cuda的介绍，这里只是简单介绍基于linux系统安装cuda的具体步骤，可以根据自己的需要安装cuda8.0或者cuda9.0，这两种版本的安装步骤基本一致，这里以最常用的ubuntu 16.04 lts版本为例：
+（1）官网下载，地址  
+cuda8.0  https://developer.nvidia.com/cuda-80-ga2-download-archive  
+cuda9.0  https://developer.nvidia.com/cuda-90-download-archive  
+进入网址之后选择对应的系统版本即可，如下图所示：  
+![cuda8.0](./img/ch15/cuda8.0.png)  
 
-### 15.6.3 GPU驱动问题
+![cuda9.0](./img/ch15/cuda9.0.png)  
+
+（2）命令行中进入到cuda所在的位置，授予运行权限：  
+cuda8.0:   sudo chmod +x cuda_8.0.61_375.26_linux.run  
+cuda9.0:    sudo chmod +x cuda_9.0.176_384.81_linux.run  
+
+（3）执行命令安装cuda：  
+cuda8.0:    sudo sh cuda_8.0.61_375.26_linux.run  
+cuda9.0:    sudo sh cuda_9.0.176_384.81_linux.run  
+之后命令之后下面就是安装步骤，cuda8.0和cuda9.0几乎一致：  
+  1） 首先出现cuda软件的版权说明，可以直接按q键跳过阅读  
+  
+
+  2） Do you accept the previously read EULA?  
+      accept/decline/quit:   **accept**
+
+  3） Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?  
+      (y)es/(n)o/(q)uit:  **no**  
+            
+  4） Install the CUDA 9.0 Toolkit?  
+      (y)es/(n)o/(q)uit:  **yes**  
+      
+  5） Enter Toolkit Location  
+     [ default is /usr/local/cuda-9.0 ]:  直接按enter键即可
+
+  6） Do you want to install a symbolic link at /usr/local/cuda?  
+      (y)es/(n)o/(q)uit:  **yes**  
+
+  7） Install the CUDA 9.0 Samples?  
+     (y)es/(n)o/(q)uit:  **yes**  
+     
+以上步骤基本就是cuda的安装步骤。  
+
+
+2，安装cudnn  
+cudnn是nvidia的专门针对深度学习的加速库。。。  
+
+
+### 15.6.3 本机安装还是使用docker？
+
+### 15.6.4 GPU驱动问题
 
 ## 15.7 框架选择
 
@@ -102,6 +150,13 @@ Nvidia一般每一两年发布一次新版本的GPU，例如2017年发布的是G
 * Tensorflow
 
 * PyTorch
+pytorch是Facebook于2017年才推出的深度学习框架，相对于其它框架，算是比较晚的了，但是这个同时也是优势，在设计的时候就会避免很多之前框架的问题，所以一经推出，就收到大家极大的欢迎
+优点：接口简洁且规范，和python无缝结合，代码设计优秀且易懂，社区非常活跃，官方修复bug及时  
+缺点:  目前模型在工业界部署相对其它框架稍有劣势，不过后续的pytorch1.0版本应该会有很大改善，和caffe2合并后，caffe2的优秀的模型部署能力可以弥补这个不足  
+相关资源链接：  
+（1）官网教程：https://pytorch.org/tutorials/  
+（2）基于pytorch的开源项目汇总：https://github.com/bharathgs/Awesome-pytorch-list  
+（3）
 
 * Keras
 
@@ -158,7 +213,7 @@ mxnet的最知名的优点就是其对多GPU的支持和扩展性强，其优秀
 
 * Tensorflow 
 
-* PyTorch
+* PyTorch  
 
 
 ### 15.8.2 是不是可以分布式训练？
@@ -234,9 +289,7 @@ keras是一种高层编程接口，其可以选择不同的后端，比如tensor
 缺点:   封装的太好了导致不理解其技术细节
 
 3,pytorch:  
-pytorch是Facebook于2017年才推出的深度学习框架，相对于其它框架，算是比较晚的了，但是这个同时也是优势，在设计的时候就会避免很多之前框架的问题，所以一经推出，就收到大家极大的欢迎
-优点：接口简洁且规范，和python无缝结合，代码设计优秀且易懂，社区非常活跃，官方修复bug及时  
-缺点:  目前模型在工业界部署相对其它框架稍有劣势，不过后续的pytorch1.0版本应该会有很大改善，和caffe2合并后，caffe2的优秀的模型部署能力可以弥补这个不足
+
 
 4,caffe2:  
 caffe2是在caffe之后的第二代版本，同属于Facebook。。。
diff --git a/ch1_数学基础/第一章_数学基础.md b/ch1_数学基础/第一章_数学基础.md
index 3574e6d..1b3708d 100644
--- a/ch1_数学基础/第一章_数学基础.md
+++ b/ch1_数学基础/第一章_数学基础.md
@@ -159,7 +159,7 @@ $$
 右边的三个矩阵相乘的结果将会是一个接近于$A$的矩阵，在这儿，$r$越接近于$n$，则相乘的结果越接近于$A$。
 
 ## 1.10 机器学习为什么要使用概率？  
-事件的概率是衡量该时间发生的可能性的量度。虽然在一次随机试验中某个事件的发生是带有偶然性的，但那些可在相同条件下大量重复的随机试验却往往呈现出明显的数量规律。  
+事件的概率是衡量该事件发生的可能性的量度。虽然在一次随机试验中某个事件的发生是带有偶然性的，但那些可在相同条件下大量重复的随机试验却往往呈现出明显的数量规律。  
 机器学习除了处理不确定量，也需处理随机量。不确定性和随机性可能来自多个方面，使用概率论来量化不确定性。  
 概率论在机器学习中扮演着一个核心角色，因为机器学习算法的设计通常依赖于对数据的概率假设。  
 
diff --git a/ch3_深度学习基础/第三章_深度学习基础.md b/ch3_深度学习基础/第三章_深度学习基础.md
index 93ecb41..edab6dd 100644
--- a/ch3_深度学习基础/第三章_深度学习基础.md
+++ b/ch3_深度学习基础/第三章_深度学习基础.md
@@ -865,7 +865,7 @@ BN比较适用的场景是：每个mini-batch比较大，数据分布比较接
 
 经过预训练最终能得到比较好的局部最优解。
 
-### 3.7.2 什么是模型微调f ine tuning
+### 3.7.2 什么是模型微调fine tuning
 
 用别人的参数、修改后的网络和自己的数据进行训练，使得参数适应自己的数据，这样一个过程，通常称之为微调（fine tuning). 
 
diff --git a/ch4_经典网络/modify_log.txt b/ch4_经典网络/modify_log.txt
index 330454f..1388880 100644
--- a/ch4_经典网络/modify_log.txt
+++ b/ch4_经典网络/modify_log.txt
@@ -24,6 +24,9 @@ modify_log---->用来记录修改日志
 1. 删除4.6.3，4.8.4
 2. 修改4.3问题答案，添加部分论文链接
 
+<----qjhuang-2018-11-9---->
+1. 修改部分书写错误
+
 其他---->待增加
 2. 修改readme内容
 3. 修改modify内容
diff --git a/ch4_经典网络/第四章_经典网络.md b/ch4_经典网络/第四章_经典网络.md
index 7e78952..480508e 100644
--- a/ch4_经典网络/第四章_经典网络.md
+++ b/ch4_经典网络/第四章_经典网络.md
@@ -421,7 +421,7 @@ figure 6' 17/8之间的特征图尺寸缩小
 ![](./img/ch4/image47.png) 
  
 &nbsp;&nbsp;&nbsp;&nbsp;
-Inception v4 中的Inception模块（分别为Inception A Inception B Inception C
+Inception v4 中的Inception模块（分别为Inception A Inception B Inception C）
 
 ![](./img/ch4/image48.png) 
 
@@ -466,8 +466,6 @@ Inception-ResNet-v2中的reduction模块（分别为reduction A reduction B）
 ![](./img/ch4/image63.png) 
 
 # 4.8 ResNet及其变体
-&nbsp;&nbsp;&nbsp;&nbsp;
-http://www.sohu.com/a/157818653_390227
 
 &nbsp;&nbsp;&nbsp;&nbsp;
 自从AlexNet在LSVRC2012分类比赛中取得胜利之后，深度残差网络（Deep Residual Network）可以说成为过去几年中，在计算机视觉、深度学习社区领域中最具突破性的成果了。ResNet可以实现高达数百，甚至数千个层的训练，且仍能获得超赞的性能。
diff --git a/ch5_卷积神经网络(CNN)/第五章_卷积神经网络（CNN）.md b/ch5_卷积神经网络(CNN)/第五章_卷积神经网络（CNN）.md
index ac922ad..7c8cb8d 100644
--- a/ch5_卷积神经网络(CNN)/第五章_卷积神经网络（CNN）.md
+++ b/ch5_卷积神经网络(CNN)/第五章_卷积神经网络（CNN）.md
@@ -1,33 +1,40 @@
-﻿# 第五章 卷积神经网络（CNN）
+# 第五章 卷积神经网络（CNN）
 
 标签（空格分隔）： 深度学习
-
 ---
+	Markdown Revision 1;
+    Date: 2018/11/08
+    Editor: 李骁丹-杜克大学
+    Contact: xiaodan.li@duke.edu
 ## 5.1 卷积神经网络的组成层
-在卷积神经网络中，有3种最主要的层：
-> * 卷积运算层
-> * 池化层
-> * 全连接层
+1. 卷积神经网络主要应用于具有网格拓扑结构的数据集。
+2. 卷积神经网络在语音识别，图像处理以及人脸识别方面有了广泛的应用并取得了巨大的成功。
+3. 卷积神经网络主要包含3种层：
+
+> * 卷积运算层：该层利用过滤器对输入特征进行卷积化以获得本地特征。本地特征构成特征图(feature map)。
+> * 池化层： 池化操作包括取最值，取和以及取平均。池化操作将作用于特征图(feature map)。
+> * 全连接层：基于池化层结果，全连接层将进行将分类预测等操作。
 
 一个完整的神经网络就是由这三种层叠加组成的。
 
 **结构示例**
 
-拿CIFAR-10数据集举例，一个典型的该数据集上的卷积神经网络分类器应该有[INPUT - CONV - RELU - POOL - FC]的结构，
+拿CIFAR-10数据集举例，一个典型的卷积神经网络分类器应该有[INPUT - CONV - RELU - POOL - FC]的结构，
 
 > * INPUT[32\*32\*3]包含原始图片数据中的全部像素，长宽都是32，有RGB 3个颜色通道。
-> * CONV卷积层中，每个神经元会和上一层的若干小区域连接，计算权重和小区域像素的内积，举个例子可能产出的结果数据是[32\*32\*12]的。
+> * CONV卷积层中，每个神经元会和上一层的若干小区域连接，计算权重和小区域像素的内积，比如结果数据为[32\*32\*12]的。
 > * RELU层，就是神经元激励层，主要的计算就是max(0,x)，结果数据依旧是[32\*32\*12]。
-> * POOLing层做的事情，可以理解成一个下采样，可能得到的结果维度就变为[16\*16\*12]了。
-> * 全连接层一般用于最后计算类别得分，得到的结果为[1\*1\*10]的，其中的10对应10个不同的类别。和名字一样，这一层的所有神经元会和上一层的所有神经元有连接。
+> * POOLing层操作可以理解成一个下采样，得到的结果维度就变为[16\*16\*12]。
+> * 全连接层一般用于最后计算类别得分，得到的结果为[1\*1\*10]，其中的10对应10个不同的类别。该层的所有神经元会和上一层的所有神经元连接。
 
-这样，卷积神经网络作为一个中间的通道，就一步步把原始的图像数据转成最后的类别得分了。有一个点我们要提一下，刚才说到了有几种不同的神经网络层，其中有一些层是有待训练参数的，另外一些没有。详细一点说，卷积层和全连接层包含权重和偏移的；而RELU和POOLing层只是一个固定的函数运算，是不包含权重和偏移参数的。不过POOLing层包含了我们手动指定的超参数，这个我们之后会提到。
+这样，卷积神经网络作为一个中间通道，就一步步把原始图像数据转成最后的类别得分。有一点我们要提一下，刚才提及的几种不同神经网络层，其中有一些层是需要进行参数训练，另外一些不需要。详细一点说，卷积层和全连接层包含权重和偏移；而RELU和POOLing层只是一个固定的函数运算，是不包含权重和偏移参数的。不过POOLing层包含了我们手动指定的超参数。
 
 **总结一下**：
-* （1）一个卷积神经网络由多种不同类型的层(卷积层/RELU层/POOLing层/全连接层等)叠加而成。
-* （2）每一层的输入结构是3维的数据，计算完输出依旧是3维的数据。
-* （3）卷积层和全连接层包含训练参数，RELU 和 POOLing 层不包含。
-* （4）卷积层，全连接层和 POOLing 层包含超参数，RELU 层没有。
+
+* 一个卷积神经网络由多种不同类型的层(卷积层/RELU层/POOLing层/全连接层等)叠加而成。
+* 每一层的输入结构是3维的数据，计算完输出依旧是3维的数据。
+* 卷积层和全连接层包含训练参数，但是RELU 和 POOLing 层不包含。
+* 卷积层，全连接层和 POOLing 层包含超参数，RELU 层没有。
 
 下图为 CIFAR-10 数据集构建的一个卷积神经网络结构示意图：
 
@@ -35,7 +42,7 @@
 
 ## 5.2 卷积如何检测边缘信息？
 
-卷积运算是卷积神经网络最基本的组成部分，神经网络的前几层首先检测边缘，然后，后面的层有可能检测到物体的部分区域，更靠后的一些层可能检测到完整的物体。
+卷积运算是卷积神经网络最基本的组成部分。神经网络的前几层首先检测边缘，接下来的层有可能检测到物体的部分区域，更靠后的一些层可能检测到完整的物体。
 先介绍一个概念，过滤器：
 
 ![image](./img/ch5/img2.png)
@@ -50,37 +57,37 @@
 
 ![image](./img/ch5/img4.png)
 
-我们看一下发生了什么事，把过滤器最准图像左上方3*3的范围，逐一相乘并相加，得到-5。
+如图深蓝色区域所示，过滤器在图像左上方3*3的范围内，逐一加权相加，得到-5。
 
 同理，将过滤器右移进行相同操作，再下移，直到过滤器对准图像右下角最后一格。依次运算得到一个4*4的矩阵。
 
-OK，了解了过滤器以及卷积运算后，让我们看看为何过滤器能检测物体边缘：
+在了解了过滤器以及卷积运算后，让我们看看为何过滤器能检测物体边缘：
 
 举一个最简单的例子：
 
 ![image](./img/ch5/img5.png)
 
-这张图片如上所示，左半边全是白的，右半边全是灰的，过滤器还是用之前那个，把他们进行卷积：
+这张图片如上所示，左半边全是白的，右半边全是灰的，我们仍然使用之前的过滤器，对该图片进行卷积处理：
 
 ![image](./img/ch5/img6.png)
 
 可以看到，最终得到的结果中间是一段白色，两边为灰色，于是垂直边缘被找到了。为什么呢？因为在6*6图像中红框标出来的部分，也就是图像中的分界线所在部分，与过滤器进行卷积，结果是30。而在不是分界线的所有部分进行卷积，结果都为0.
 
-在这个图中，白色的分界线很粗，那是因为6*6的图像实在太小了，若是换成1000*1000的图像，我们会发现在最终结果中，分界线不粗且很明显。
+在这个图中，白色的分界线很粗，那是因为6\*6的图像尺寸过小，对于1000\*1000的图像，我们会发现在最终结果中，分界线较细但很明显。
 
 这就是检测物体垂直边缘的例子，水平边缘的话只需将过滤器旋转90度。
 
 ## 5.3 卷积的几个基本定义？
 
-首先，我们需要就定义一个卷积层的几个参数达成一致。
+首先，我们需要定义卷积层参数。
 
 ### 5.3.1 卷积核大小
 
-（Kernel Size）: 卷积核的大小定义了卷积的视图范围。二维的常见选择大小是3，即3×3像素。
+（Kernel Size）: 卷积核大小定义了卷积的视图范围。针对二维，通常选择大小是3，即3×3像素。
 
 ### 5.3.2 卷积核的步长
 
-（Stride）: Stride定义了内核的步长。虽然它的默认值通常为1，但我们可以将步长设置为2，然后对类似于MaxPooling的图像进行向下采样。
+（Stride）: Stride定义了卷积核步长。虽然它的默认值通常为1，但我们可以将步长设置为2，然后对类似于MaxPooling的图像进行向下采样。
 
 ### 5.3.3 边缘填充
 
@@ -88,40 +95,41 @@ OK，了解了过滤器以及卷积运算后，让我们看看为何过滤器能
 
 ### 5.3.4 输入和输出通道
 
- 一个卷积层接受一定数量的输入通道(I)，并计算一个特定数量的输出通道(O)，这一层所需的参数可以由I*O*K计算，K等于卷积核中值的数量。
+ 一个卷积层接受一定数量的输入通道(I)，并计算一个特定数量的输出通道(O)，这一层所需的参数可以由I\*O\*K计算，K等于卷积核中值的数量。
 
 ## 5.4 卷积的网络类型分类？
 
 ### 5.4.1 普通卷积
+普通卷积如下图所示。
 
 ![image](./img/ch5/img7.png)
 
 ### 5.4.2 扩张卷积
 
-又名空洞（atrous）卷积，扩张的卷积引入了另一个被称为扩张率（dilation rate）的卷积层。这定义了卷积核中值之间的间隔。一个3×3卷积核的扩张率为2，它的视图与5×5卷积核相同，而只使用9个参数。想象一下，取一个5×5卷积核，每两行或两列删除一行或一列。
+又名空洞（atrous）卷积，扩张的卷积引入了另一个被称为扩张率（dilation rate）的卷积层。扩张率定义了卷积核中值之间的间隔。一个3×3卷积核的扩张率为2，它的视图与5×5卷积核相同，而只使用9个参数。一个5×5卷积核，每两行或两列删除一行或一列。
 
-这将以同样的计算代价提供更广阔的视角。扩张的卷积在实时分割领域特别受欢迎。如果需要广泛的视图，并且不能负担多个卷积或更大的卷积核，那么就使用它们。
+这将以同样的计算代价提供更广阔的视角。扩张的卷积在实时分割领域特别受欢迎。如果需要广泛视图，但不能负担较多个卷积或更大的卷积核，往往会使用扩张卷积。
 举例：
 
 ![image](./img/ch5/img8.png)
 
 ### 5.4.3 转置卷积
 
-转置卷积也就是反卷积（deconvolution）。虽然有些人经常直接叫它反卷积，但严格意义上讲是不合适的，因为它不符合一个反卷积的概念。反卷积确实存在，但它们在深度学习领域并不常见。一个实际的反卷积会恢复卷积的过程。想象一下，将一个图像放入一个卷积层中。现在把输出传递到一个黑盒子里，然后你的原始图像会再次出来。这个黑盒子就完成了一个反卷积。这是一个卷积层的数学逆过程。
+转置卷积也称反卷积（deconvolution）。虽然有些人经常直接叫其反卷积，但严格意义上讲是不合适的，因为它不符合反卷积的基本概念。反卷积确实存在，但它们在深度学习领域并不常见。一个实际的反卷积会恢复卷积的过程。将一个图像放入一个卷积层中。接着把输出传输到一个黑盒子里，最后该黑盒将输出原始图像。这个黑盒子就完成了一个反卷积。其是卷积层的数学逆过程。
 
-一个转置的卷积在某种程度上是相似的，因为它产生的相同的空间分辨率是一个假设的反卷积层。然而，在值上执行的实际数学操作是不同的。一个转置的卷积层执行一个常规的卷积，但是它会恢复它的空间变换（spatial transformation）。
+转置卷积在某种程度上与反卷积类似，因为它在相同的空间分辨率下，是一个假设的反卷积层。然而，在数值上执行的实际数学操作是不同的。一个转置卷积层执行一个常规的卷积，但是它会恢复它的空间变换（spatial transformation）。
 
 在这一点上，你应该非常困惑，让我们来看一个具体的例子：
 5×5的图像被馈送到一个卷积层。步长设置为2，无边界填充，而卷积核是3×3。结果得到了2×2的图像。
-如果我们想要逆转这个过程，我们需要反向的数学运算，以便从我们输入的每个像素中生成9个值。然后，我们将步长设置为2来遍历输出图像。这就是一个反卷积过程。
+如果想要逆转这个过程，我们需要反向的数学运算，以便从我们输入的每个像素中生成9个值。我们将步长设置为2来遍历输出图像。这就是一个反卷积过程。
 
 ![image](./img/ch5/img9.png)
 
-一个转置的卷积并不会这样做。唯一的共同点是，它保证输出将是一个5×5的图像，同时仍然执行正常的卷积运算。为了实现这一点，我们需要在输入上执行一些奇特的填充。
+一个转置的卷积做法与此不同。唯一的共同点是，它的输出仍然是一个5×5的图像，但其实际上执行的是正常卷积运算。为了实现这一点，我们需要在输入上执行一些奇特的填充。
 
-正如你现在所能想象的，这一步不会逆转上面的过程。至少不考虑数值。
+因此这一步不会逆转上面的过程。至少不考虑数值。
 
-它仅仅是重新构造了之前的空间分辨率并进行了卷积运算。这可能不是数学上的逆过程，但是对于编码-解码器（Encoder-Decoder）架构来说，这仍然是非常有用的。这样我们就可以把图像的尺度上推（upscaling）和卷积结合起来，而不是做两个分离的过程。
+它仅仅是重新构造了之前的空间分辨率并进行了卷积运算。这不是数学上的逆过程，但是对于编码-解码器（Encoder-Decoder）架构来说，这仍然是非常有用的。这样我们就可以把图像的尺度上推（upscaling）和卷积结合起来，而不是做两个分离的过程。
 
 如果我们想反转这个过程，我们需要反数学运算，以便从我们输入的每个像素中生成9个值。之后，我们以2步幅的设置来遍历输出图像。这将是一个反卷积。
 
diff --git a/ch6_循环神经网络(RNN)/第六章_循环神经网络(RNN).md b/ch6_循环神经网络(RNN)/第六章_循环神经网络(RNN).md
index 33675c7..7f41126 100644
--- a/ch6_循环神经网络(RNN)/第六章_循环神经网络(RNN).md
+++ b/ch6_循环神经网络(RNN)/第六章_循环神经网络(RNN).md
@@ -1,5 +1,10 @@
 # 第六章 循环神经网络(RNN)
 
+    Markdown Revision 2;
+    Date: 2018/11/07
+    Editor: 李骁丹-杜克大学
+    Contact: xiaodan.li@duke.edu
+    
     Markdown Revision 1;
     Date: 2018/10/26
     Editor: 杨国峰-中国农业科学院
@@ -9,14 +14,17 @@ http://blog.csdn.net/heyongluoyao8/article/details/48636251
 
 ## 6.1 RNNs和FNNs有什么区别？
 
-不同于传统的前馈神经网络(FNNs)，RNNs引入了定向循环，能够处理那些输入之间前后关联的问题。**定向循环结构如下图所示**：
+1. 不同于传统的前馈神经网络(FNNs)，RNNs引入了定向循环，能够处理输入之间前后关联问题。
+2. RNNs可以记忆之前步骤的训练信息。
+**定向循环结构如下图所示**：
 
 ![](./img/ch6/figure_6.1_1.jpg)
 
 ## 6.2 RNNs典型特点？
 
-RNNs的目的使用来处理序列数据。在传统的神经网络模型中，是从输入层到隐含层再到输出层，层与层之间是全连接的，每层之间的节点是无连接的。但是这种普通的神经网络对于很多问题却无能无力。例如，你要预测句子的下一个单词是什么，一般需要用到前面的单词，因为一个句子中前后单词并不是独立的。
-RNNs之所以称为循环神经网路，即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中，即隐藏层之间的节点不再无连接而是有连接的，并且隐藏层的输入不仅包括输入层的输出还包括上一时刻隐藏层的输出。理论上，RNNs能够对任何长度的序列数据进行处理。但是在实践中，为了降低复杂性往往假设当前的状态只与前面的几个状态相关，**下图便是一个典型的RNNs**：
+1. RNNs主要用于处理序列数据。对于传统神经网络模型，从输入层到隐含层再到输出层，层与层之间一般为全连接，每层之间神经元是无连接的。但是传统神经网络无法处理数据间的前后关联问题。例如，为了预测句子的下一个单词，一般需要该词之前的语义信息。这是因为一个句子中前后单词是存在语义联系的。
+2. RNNs中当前单元的输出与之前步骤输出也有关，因此称之为循环神经网络。具体的表现形式为当前单元（cell）会对之前步骤信息进行储存并应用于当前输出的计算中。隐藏层之间的节点连接起来，隐藏层当前输出由当前时刻输入向量和之前时刻隐藏层状态共同决定。
+3. 理论上，RNNs能够对任何长度序列数据进行处理。但是在实践中，为了降低复杂度往往假设当前的状态只与之前某几个时刻状态相关，**下图便是一个典型的RNNs**：
 
 ![](./img/ch6/figure_6.2_1.png)
 
@@ -30,21 +38,22 @@ RNNs之所以称为循环神经网路，即一个序列当前的输出与前面
 
 **图中信息传递特点：**
 
-1. 有一条单向流动的信息流是从输入单元到达隐藏单元；
-2. 与此同时，另一条单向流动的信息流从隐藏单元到达输出单元；
-3. 在某些情况下，RNNs会打破后者的限制，引导信息从输出单元返回隐藏单元，这些被称为“Back Projections”；
-4. 在某些情况下，隐藏层的输入还包括上一隐藏层的状态，即隐藏层内的节点可以自连也可以互连。 
+1. 一条单向流动的信息流是从输入单元到隐藏单元。
+2. 一条单向流动的信息流从隐藏单元到输出单元。
+3. 在某些情况下，RNNs会打破后者的限制，引导信息从输出单元返回隐藏单元，这些被称为“Back Projections”。
+4. 在某些情况下，隐藏层的输入还包括上一时刻隐藏层的状态，即隐藏层内的节点可以自连也可以互连。 
+5. 当前单元（cell）输出是由当前时刻输入和上一时刻隐藏层状态共同决定。
 
 
-## 6.3RNNs能干什么？
+## 6.3 RNNs能干什么？
 
-RNNs已经被在实践中证明对NLP是非常成功的。如词向量表达、语句合法性检查、词性标注等。在RNNs中，目前使用最广泛最成功的模型便是LSTMs(Long Short-Term Memory，长短时记忆模型)模型，该模型通常比 vanilla RNNs 能够更好地对长短时依赖进行表达，该模型相对于一般的RNNs，只是在隐藏层做了改变。
+RNNs在自然语言处理领域取得了巨大成功，如词向量表达、语句合法性检查、词性标注等。在RNNs及其变型中，目前使用最广泛最成功的模型是LSTMs(Long Short-Term Memory，长短时记忆模型)模型，该模型相比于RNNs，能够更好地对长短时依赖进行描述。
 
 
 ## 6.4 RNNs在NLP中典型应用？
 **（1）语言模型与文本生成(Language Modeling and Generating Text)**
 
-给一个单词序列，需要根据前面的单词预测每一个单词的可能性。语言模型能够一个语句正确的可能性，这是机器翻译的一部分，往往可能性越大，语句越正确。另一种应用便是使用生成模型预测下一个单词的概率，从而生成新的文本根据输出概率的采样。
+给定一组单词序列，需要根据前面单词预测每个单词出现的可能性。语言模型能够评估某个语句正确的可能性，可能性越大，语句越正确。另一种应用便是使用生成模型预测下一个单词的出现概率，从而利用输出概率的采样生成新的文本。
 
 **（2）机器翻译(Machine Translation)**
 
@@ -52,74 +61,85 @@ RNNs已经被在实践中证明对NLP是非常成功的。如词向量表达、
 
 **（3）语音识别(Speech Recognition)**
 
-语音识别是指给一段声波的声音信号，预测该声波对应的某种指定源语言的语句以及该语句的概率值。 
+语音识别是指给定一段声波的声音信号，预测该声波对应的某种指定源语言语句以及计算该语句的概率值。 
 
 **（4）图像描述生成 (Generating Image Descriptions)**
 
-和卷积神经网络(convolutional Neural Networks, CNNs)一样，RNNs已经在对无标图像描述自动生成中得到应用。将CNNs与RNNs结合进行图像描述自动生成。
+同卷积神经网络(convolutional Neural Networks, CNNs)一样，RNNs已经在对无标图像描述自动生成中得到应用。CNNs与RNNs结合也被应用于图像描述自动生成。
 ![](./img/ch6/figure_6.4_1.png)
 
 ## 6.5 RNNs训练和传统ANN训练异同点？
 
-**相同点**：同样使用BP误差反向传播算法。
+**相同点**：
+1. RNNs与传统ANN都使用BP（Back Propagation）误差反向传播算法。
 
 **不同点**：
 
-1. 如果将RNNs进行网络展开，那么参数W,U,V是共享的，而传统神经网络却不是的。
-2. 在使用梯度下降算法中，每一步的输出不仅依赖当前步的网络，并且还以来前面若干步网络的状态。
+1. RNNs网络参数W,U,V是共享的，而传统神经网络各层参数间没有直接联系。
+2. 对于RNNs，在使用梯度下降算法中，每一步的输出不仅依赖当前步的网络，还依赖于之前若干步的网络状态。
 
 
-## 6.6常见的RNNs扩展和改进模型
+## 6.6 常见的RNNs扩展和改进模型
 
 
-### 6.6.1Simple RNNs(SRNs)
+### 6.6.1 Simple RNNs(SRNs)
 
-SRNs是RNNs的一种特例，它是一个三层网络，并且在隐藏层增加了上下文单元，下图中的**y**便是隐藏层，**u**便是上下文单元。上下文单元节点与隐藏层中的节点的连接是固定(谁与谁连接)的，并且权值也是固定的(值是多少)，其实是一个上下文节点与隐藏层节点一一对应，并且值是确定的。在每一步中，使用标准的前向反馈进行传播，然后使用学习算法进行学习。上下文每一个节点保存其连接的隐藏层节点的上一步的输出，即保存上文，并作用于当前步对应的隐藏层节点的状态，即隐藏层的输入由输入层的输出与上一步的自己的状态所决定的。因此SRNs能够解决标准的多层感知机(MLP)无法解决的对序列数据进行预测的任务。 
+1. SRNs是RNNs的一种特例，它是一个三层网络，其在隐藏层增加了上下文单元。下图中的**y**是隐藏层，**u**是上下文单元。上下文单元节点与隐藏层中节点的连接是固定的，并且权值也是固定的。上下文节点与隐藏层节点一一对应，并且值是确定的。
+2. 在每一步中，使用标准的前向反馈进行传播，然后使用学习算法进行学习。上下文每一个节点保存其连接隐藏层节点上一步输出，即保存上文，并作用于当前步对应的隐藏层节点状态，即隐藏层的输入由输入层的输出与上一步的自身状态所决定。因此SRNs能够解决标准多层感知机(MLP)无法解决的对序列数据进行预测的问题。 
 **SRNs网络结构如下图所示**：
 
 ![](./img/ch6/figure_6.6.1_1.png)
 
-### 6.6.2Bidirectional RNNs
+### 6.6.2 Bidirectional RNNs
 
-Bidirectional RNNs(双向网络)的改进之处便是，假设当前的输出(第t步的输出)不仅仅与前面的序列有关，并且还与后面的序列有关。例如：预测一个语句中缺失的词语那么就需要根据上下文来进行预测。Bidirectional RNNs是一个相对较简单的RNNs，是由两个RNNs上下叠加在一起组成的。输出由这两个RNNs的隐藏层的状态决定的。**如下图所示**：
+Bidirectional RNNs(双向网络)将两层RNNs叠加在一起，当前时刻输出(第t步的输出)不仅仅与之前序列有关，还与之后序列有关。例如：为了预测一个语句中的缺失词语，就需要该词汇的上下文信息。Bidirectional RNNs是一个相对较简单的RNNs，是由两个RNNs上下叠加在一起组成的。输出由前向RNNs和后向RNNs共同决定。**如下图所示**：
 
 ![](./img/ch6/figure_6.6.2_1.png)
 
-### 6.6.3Deep(Bidirectional) RNNs
+### 6.6.3 Deep RNNs
 
-Deep(Bidirectional)RNNs与Bidirectional RNNs相似，只是对于每一步的输入有多层网络。这样，该网络便有更强大的表达与学习能力，但是复杂性也提高了，同时需要更多的训练数据。**Deep(Bidirectional)RNNs的结构如下图所示**：
+Deep RNNs与Bidirectional RNNs相似，其也是又多层RNNs叠加，因此每一步的输入有了多层网络。该网络具有更强大的表达与学习能力，但是复杂性也随之提高，同时需要更多的训练数据。**Deep RNNs的结构如下图所示**：
 ![](./img/ch6/figure_6.6.3_1.png)
 
-### 6.6.4Echo State Networks（ESNs）
-ESNs(回声状态网络)虽然也是一种RNNs，但是它与传统的RNNs相差很大。
+### 6.6.4 Echo State Networks（ESNs）
+ESNs(回声状态网络)虽然也是一种RNNs，但它与传统的RNNs相差较大。
 
 **ESNs具有三个特点**：
 
-（1）它的核心结构时一个随机生成、且保持不变的储备池(Reservoir)，储备池是大规模的、随机生成的、稀疏连接(SD通常保持1%～5%，SD表示储备池中互相连接的神经元占总的神经元个数N的比例)的循环结构；
+1. 它的核心结构为一个随机生成、且保持不变的储备池(Reservoir)。储备池是大规模随机生成稀疏连接(SD通常保持1%～5%，SD表示储备池中互相连接的神经元占总神经元个数N的比例)的循环结构；
 
-（2）其储备池到输出层的权值矩阵是唯一需要调整的部分；
+2. 从储备池到输出层的权值矩阵是唯一需要调整的部分；
 
-（3）简单的线性回归就可完成网络的训练。
+3. 简单的线性回归便能够完成网络训练；
 
-从结构上讲，ESNs是一种特殊类型的循环神经网络，其基本思想是：使用大规模随机连接的循环网络取代经典神经网络中的中间层，从而简化网络的训练过程。因此ESNs的关键是中间的储备池。网络中的参数包括：W为储备池中节点的连接权值矩阵，Win为输入层到储备池之间的连接权值矩阵，表明储备池中的神经元之间是连接的，Wback为输出层到储备池之间的反馈连接权值矩阵，表明储备池会有输出层来的反馈，Wout为输入层、储备池、输出层到输出层的连接权值矩阵，表明输出层不仅与储备池连接，还与输入层和自己连接。Woutbias表示输出层的偏置项。 
-对于ESNs，关键是储备池的四个参数，如储备池内部连接权谱半径SR(SR=λmax=max{|W的特征指|}，只有SR <1时，ESNs才能具有回声状态属性)、储备池规模N(即储备池中神经元的个数)、储备池输入单元尺度IS(IS为储备池的输入信号连接到储备池内部神经元之前需要相乘的一个尺度因子)、储备池稀疏程度SD(即为储备池中互相连接的神经元个数占储备池神经元总个数的比例)。对于IS，如果需要处理的任务的非线性越强，那么输入单元尺度越大。该原则的本质就是通过输入单元尺度IS，将输入变换到神经元激活函数相应的范围(神经元激活函数的不同输入范围，其非线性程度不同)。
+从结构上讲，ESNs是一种特殊类型的循环神经网络，其基本思想是：使用大规模随机连接的循环网络取代经典神经网络中的中间层，从而简化网络的训练过程。因此ESNs的关键是储备池。
+网络中的参数包括：
+（1）W - 储备池中节点间连接权值矩阵；
+（2）Win - 输入层到储备池之间连接权值矩阵，表明储备池中的神经元之间是相互连接；
+（3）Wback - 输出层到储备池之间的反馈连接权值矩阵，表明储备池会有输出层来的反馈；
+（4）Wout - 输入层、储备池、输出层到输出层的连接权值矩阵，表明输出层不仅与储备池连接，还与输入层和自己连接。
+Woutbias - 输出层的偏置项。 
+对于ESNs，关键是储备池的四个参数，如储备池内部连接权谱半径SR(SR=λmax=max{|W的特征指|}，只有SR <1时，ESNs才能具有回声状态属性)、储备池规模N(即储备池中神经元的个数)、储备池输入单元尺度IS(IS为储备池的输入信号连接到储备池内部神经元之前需要相乘的一个尺度因子)、储备池稀疏程度SD(即为储备池中互相连接的神经元个数占储备池神经元总个数的比例)。对于IS，待处理任务的非线性越强，输入单元尺度越大。该原则本质就是通过输入单元尺度IS，将输入变换到神经元激活函数相应的范围(神经元激活函数的不同输入范围，其非线性程度不同)。
 **ESNs的结构如下图所示**：
 
 ![](./img/ch6/figure_6.6.4_1.png)
 ![](./img/ch6/figure_6.6.4_2.png)
 ![](./img/ch6/figure_6.6.4_3.png)
 
-### 6.6.5Gated Recurrent Unit Recurrent Neural Networks
-GRUs也是一般的RNNs的改良版本，主要是从以下**两个方面**进行改进。
+### 6.6.5 Gated Recurrent Unit Recurrent Neural Networks
+GRUs是一般的RNNs的变型版本，其主要是从以下**两个方面**进行改进。
 
-**一是**，序列中不同的位置处的单词(已单词举例)对当前的隐藏层的状态的影响不同，越前面的影响越小，即每个前面状态对当前的影响进行了距离加权，距离越远，权值越小。
+1. 序列中不同单词处（以语句为例）的数据对当前隐藏层状态的影响不同，越前面的影响越小，即每个之前状态对当前的影响进行了距离加权，距离越远，权值越小。
 
-**二是**，在产生误差error时，误差可能是由某一个或者几个单词而引发的，所以应当仅仅对对应的单词weight进行更新。GRUs的结构如下图所示。GRUs首先根据当前输入单词向量word vector已经前一个隐藏层的状态hidden state计算出update gate和reset gate。再根据reset gate、当前word vector以及前一个hidden state计算新的记忆单元内容(new memory content)。当reset gate为1的时候，new memory content忽略之前的所有memory content，最终的memory是之前的hidden state与new memory content的结合。
+2. 在产生误差error时，其可能是由之前某一个或者几个单词共同造成，所以应当对对应的单词weight进行更新。GRUs的结构如下图所示。GRUs首先根据当前输入单词向量word vector以及前一个隐藏层状态hidden state计算出update gate和reset gate。再根据reset gate、当前word vector以及前一个hidden state计算新的记忆单元内容(new memory content)。当reset gate为1的时候，new memory content忽略之前所有memory content，最终的memory是由之前的hidden state与new memory content一起决定。
 
 ![](./img/ch6/figure_6.6.5_1.png)
 
-### 6.6.6LSTM Netwoorks
-LSTMs与GRUs类似，目前非常流行。它与一般的RNNs结构本质上并没有什么不同，只是使用了不同的函数去去计算隐藏层的状态。在LSTMs中，i结构被称为cells，可以把cells看作是黑盒用以保存当前输入xt之前的保存的状态ht−1，这些cells更加一定的条件决定哪些cell抑制哪些cell兴奋。它们结合前面的状态、当前的记忆与当前的输入。已经证明，该网络结构在对长序列依赖问题中非常有效。**LSTMs的网络结构如下图所示**。
+### 6.6.6 LSTM Netwoorks
+1. LSTMs是当前一种非常流行的深度学习模型。为了解决RNNs存在的长时记忆问题，LSTMs利用了之前更多步的训练信息。
+2. LSTMs与一般的RNNs结构本质上并没有太大区别，只是使用了不同函数控制隐藏层的状态。
+3. 在LSTMs中，基本结构被称为cell，可以把cell看作是黑盒用以保存当前输入之前Xt的隐藏层状态ht−1。
+4. LSTMs有三种类型的门：遗忘门（forget gate）, 输入门（input gate）以及输出门（output gate）。遗忘门（forget gate）是用来决定 哪个cells的状态将被丢弃掉。输入门（input gate）决定哪些cells会被更新. 输出门（output gate）控制了结果输出. 因此当前输出依赖于cells状态以及门的过滤条件。实践证明，LSTMs可以有效地解决长序列依赖问题。**LSTMs的网络结构如下图所示**。
 
 ![](./img/ch6/figure_6.6.6_1.png)
 
@@ -127,20 +147,31 @@ LSTMs与GRUs的区别如图所示：
 
 ![](./img/ch6/figure_6.6.6_2.png)
 
-从上图可以看出，它们之间非常相像，**不同在于**：
+从上图可以看出，二者结构十分相似，**不同在于**：
 
-（1）new memory的计算方法都是根据之前的state及input进行计算，但是GRUs中有一个reset gate控制之前state的进入量，而在LSTMs里没有这个gate；
+1. new memory都是根据之前state及input进行计算，但是GRUs中有一个reset gate控制之前state的进入量，而在LSTMs里没有类似gate；
 
-（2）产生新的state的方式不同，LSTMs有两个不同的gate，分别是forget gate (f gate)和input gate(i gate)，而GRUs只有一个update gate(z gate)；
+2. 产生新的state的方式不同，LSTMs有两个不同的gate，分别是forget gate (f gate)和input gate(i gate)，而GRUs只有一种update gate(z gate)；
 
-（3）LSTMs对新产生的state又一个output gate(o gate)可以调节大小，而GRUs直接输出无任何调节。
+3. LSTMs对新产生的state可以通过output gate(o gate)进行调节，而GRUs对输出无任何调节。
 
-### 6.6.7Clockwork RNNs(CW-RNNs)
-CW-RNNs是较新的一种RNNs模型，其论文发表于2014年Beijing ICML。
-CW-RNNs也是一个RNNs的改良版本，是一种使用时钟频率来驱动的RNNs。它将隐藏层分为几个块(组，Group/Module)，每一组按照自己规定的时钟频率对输入进行处理。并且为了降低标准的RNNs的复杂性，CW-RNNs减少了参数的数目，提高了网络性能，加速了网络的训练。CW-RNNs通过不同的隐藏层模块工作在不同的时钟频率下来解决长时间依赖问题。将时钟时间进行离散化，然后在不同的时间点，不同的隐藏层组在工作。因此，所有的隐藏层组在每一步不会都同时工作，这样便会加快网络的训练。并且，时钟周期小的组的神经元的不会连接到时钟周期大的组的神经元，只会周期大的连接到周期小的(认为组与组之间的连接是有向的就好了，代表信息的传递是有向的)，周期大的速度慢，周期小的速度快，那么便是速度慢的连速度快的，反之则不成立。现在还不明白不要紧，下面会进行讲解。
+### 6.6.7 Bidirectional LSTMs
+1. 与bidirectional RNNs 类似，bidirectional LSTMs有两层LSTMs。一层处理过去的训练信息，另一层处理将来的训练信息。
+2. 在bidirectional LSTMs中，通过前向LSTMs获得前向隐藏状态，后向LSTMs获得后向隐藏状态，当前隐藏状态是前向隐藏状态与后向隐藏状态的组合。
 
-CW-RNNs与SRNs网络结构类似，也包括输入层(Input)、隐藏层(Hidden)、输出层(Output)，它们之间也有向前连接，输入层到隐藏层的连接，隐藏层到输出层的连接。但是与SRN不同的是，隐藏层中的神经元会被划分为若干个组，设为$g$，每一组中的神经元个数相同，设为$k$，并为每一个组分配一个时钟周期$T_i\epsilon\{T_1,T_2,...,T_g\}$，每一个组中的所有神经元都是全连接，但是组$j$到组$i$的循环连接则需要满足$T_j$大于$T_i$。如下图所示，将这些组按照时钟周期递增从左到右进行排序，即$T_1<T_2<...<T_g$，那么连接便是从右到左。例如：隐藏层共有256个节点，分为四组，周期分别是[1,2,4,8]，那么每个隐藏层组256/4=64个节点，第一组隐藏层与隐藏层的连接矩阵为64$\times$64的矩阵，第二层的矩阵则为64$\times$128矩阵，第三组为64$\times$(3$\times$64)=64$\times$192矩阵，第四组为64$\times$(4$\times$64)=64$\times$256矩阵。这就解释了上一段的后面部分，速度慢的组连到速度快的组，反之则不成立。
+### 6.6.8 Stacked LSTMs
+1. 与deep rnns 类似，stacked LSTMs 通过将多层LSTMs叠加起来得到一个更加复杂的模型。
+2. 不同于bidirectional LSTMs，stacked LSTMs只利用之前步骤的训练信息。 
+
+### 6.6.9 Clockwork RNNs(CW-RNNs)
+CW-RNNs是较新的一种RNNs模型，该模型首次发表于2014年Beijing ICML。
+CW-RNNs是RNNs的改良版本，其使用时钟频率来驱动。它将隐藏层分为几个块(组，Group/Module)，每一组按照自己规定的时钟频率对输入进行处理。为了降低RNNs的复杂度，CW-RNNs减少了参数数量，并且提高了网络性能，加速网络训练。CW-RNNs通过不同隐藏层模块在不同时钟频率下工作来解决长时依赖问题。将时钟时间进行离散化，不同的隐藏层组将在不同时刻进行工作。因此，所有的隐藏层组在每一步不会全部同时工作，这样便会加快网络的训练。并且，时钟周期小组的神经元不会连接到时钟周期大组的神经元，只允许周期大的神经元连接到周期小的(组与组之间的连接以及信息传递是有向的)。周期大的速度慢，周期小的速度快，因此是速度慢的神经元连速度快的神经元，反之则不成立。
+
+CW-RNNs与SRNs网络结构类似，也包括输入层(Input)、隐藏层(Hidden)、输出层(Output)，它们之间存在前向连接，输入层到隐藏层连接，隐藏层到输出层连接。但是与SRN不同的是，隐藏层中的神经元会被划分为若干个组，设为$g$，每一组中的神经元个数相同，设为$k$，并为每一个组分配一个时钟周期$T_i\epsilon\{T_1,T_2,...,T_g\}$，每一组中的所有神经元都是全连接，但是组$j$到组$i$的循环连接则需要满足$T_j$大于$T_i$。如下图所示，将这些组按照时钟周期递增从左到右进行排序，即$T_1<T_2<...<T_g$，那么连接便是从右到左。例如：隐藏层共有256个节点，分为四组，周期分别是[1,2,4,8]，那么每个隐藏层组256/4=64个节点，第一组隐藏层与隐藏层的连接矩阵为64$\times$64的矩阵，第二层的矩阵则为64$\times$128矩阵，第三组为64$\times$(3$\times$64)=64$\times$192矩阵，第四组为64$\times$(4$\times$64)=64$\times$256矩阵。这就解释了上一段中速度慢的组连接到速度快的组，反之则不成立。
 
 **CW-RNNs的网络结构如下图所示**：
 
 ![](./img/ch6/figure_6.6.7_1.png)
+
+### 6.6.10 CNN-LSTMs
+1. 为了同时利用CNN以及LSTMs的优点，CNN-LSTMs被提出。在该模型中，CNN用于提取对象特征，LSTMs用于预测。CNN由于卷积特性，其能够快速而且准确地捕捉对象特征。LSTMs的优点在于能够捕捉数据间的长时依赖性。
diff --git a/ch7_生成对抗网络(GAN)/7.1.1_what_gan.md b/ch7_生成对抗网络(GAN)/7.1.1_what_gan.md
new file mode 100644
index 0000000..6972a08
--- /dev/null
+++ b/ch7_生成对抗网络(GAN)/7.1.1_what_gan.md
@@ -0,0 +1,38 @@
+# 什么是生成对抗网络
+## GAN的通俗化介绍
+
+生成对抗网络(GAN, Generative adversarial network)自从2014年被Ian Goodfellow提出以来，掀起来了一股研究热潮。GAN由生成器和判别器组成，生成器负责生成样本，判别器负责判断生成器生成的样本是否为真。生成器要尽可能迷惑判别器，而判别器要尽可能区分生成器生成的样本和真实样本。
+
+在GAN的原作[1]中，作者将生成器比喻为印假钞票的犯罪分子，判别器则类比为警察。犯罪分子努力让钞票看起来逼真，警察则不断提升对于假钞的辨识能力。二者互相博弈，随着时间的进行，都会越来越强。
+
+# GAN的形式化表达
+上述例子只是简要介绍了一下GAN的思想，下面对于GAN做一个形式化的，更加具体的定义。通常情况下，无论是生成器还是判别器，我们都可以用神经网络来实现。那么，我们可以把通俗化的定义用下面这个模型来表示：
+![GAN网络结构](/images/7.1-gan_structure.png)
+
+上述模型左边是生成器G，其输入是$$z$$，对于原始的GAN，$$z$$是由高斯分布随机采样得到的噪声。噪声$$z$$通过生成器得到了生成的假样本。
+
+生成的假样本与真实样本放到一起，被随机抽取送入到判别器D，由判别器去区分输入的样本是生成的假样本还是真实的样本。整个过程简单明了，生成对抗网络中的“生成对抗”主要体现在生成器和判别器之间的对抗。
+
+# GAN的目标函数
+对于上述神经网络模型，如果想要学习其参数，首先需要一个目标函数。GAN的目标函数定义如下：
+
+$$\mathop {\min }\limits_G \mathop {\max }\limits_D V(D,G) = {{\rm E}_{x\sim{p_{data}}(x)}}[\log D(x)] + {{\rm E}_{z\sim{p_z}(z)}}[\log (1 - D(G(z)))]$$
+
+这个目标函数可以分为两个部分来理解：
+
+判别器的优化通过$$\mathop {\max}\limits_D V(D,G)$$实现，$$V(D,G)$$为判别器的目标函数，其第一项$${{\rm E}_{x\sim{p_{data}}(x)}}[\log D(x)]$$表示对于从真实数据分布 中采用的样本 ,其被判别器判定为真实样本概率的数学期望。对于真实数据分布 中采样的样本，其预测为正样本的概率当然是越接近1越好。因此希望最大化这一项。第二项$${{\rm E}_{z\sim{p_z}(z)}}[\log (1 - D(G(z)))]$$表示：对于从噪声P_z(z)分布当中采样得到的样本经过生成器生成之后得到的生成图片，然后送入判别器，其预测概率的负对数的期望，这个值自然是越大越好，这个值越大， 越接近0，也就代表判别器越好。
+
+生成器的优化通过$$\mathop {\min }\limits_G({\mathop {\max }\limits_D V(D,G)})$$实现。注意，生成器的目标不是$$\mathop {\min }\limits_GV(D,G)$$，即生成器**不是最小化判别器的目标函数**，生成器最小化的是**判别器目标函数的最大值**，判别器目标函数的最大值代表的是真实数据分布与生成数据分布的JS散度(详情可以参阅附录的推导)，JS散度可以度量分布的相似性，两个分布越接近，JS散度越小。
+
+
+# GAN的目标函数和交叉熵
+判别器目标函数写成离散形式即为$$V(D,G)=-\frac{1}{m}\sum_{i=1}^{i=m}logD(x^i)-\frac{1}{m}\sum_{i=1}^{i=m}log(1-D(\tilde{x}^i))$$
+可以看出，这个目标函数和交叉熵是一致的，即**判别器的目标是最小化交叉熵损失，生成器的目标是最小化生成数据分布和真实数据分布的JS散度**
+
+
+
+
+
+-------------------
+[1]: Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
+[2]
\ No newline at end of file
diff --git a/ch7_生成对抗网络(GAN)/img/ch7/7.1-gan_structure.png b/ch7_生成对抗网络(GAN)/img/ch7/7.1-gan_structure.png
new file mode 100644
index 0000000..b2d6213
Binary files /dev/null and b/ch7_生成对抗网络(GAN)/img/ch7/7.1-gan_structure.png differ
diff --git a/ch7_生成对抗网络(GAN)/第七章_生成对抗网络(GAN).md b/ch7_生成对抗网络(GAN)/第七章_生成对抗网络(GAN).md
index b032d5b..0fd800a 100644
--- a/ch7_生成对抗网络(GAN)/第七章_生成对抗网络(GAN).md
+++ b/ch7_生成对抗网络(GAN)/第七章_生成对抗网络(GAN).md
@@ -1,15 +1,19 @@
-# 第七章 生成对抗网络(GAN)
+# 第七章_生成对抗网络(GAN)
 
-## GAN生成概率的本质是什么？
+## GAN的「生成」的本质是什么？
 GAN的形式是：两个网络，G（Generator）和D（Discriminator）。Generator是一个生成图片的网络，它接收一个随机的噪声z，记做G(z)。Discriminator是一个判别网络，判别一张图片是不是“真实的”。它的输入是x，x代表一张图片，输出D（x）代表x为真实图片的概率，如果为1，就代表100%是真实的图片，而输出为0，就代表不可能是真实的图片。
 
-GAN*生成*能力是学习*分布*，引入的latent variable的noise使习得的概率分布进行偏移。因此在训练GAN的时候，latent variable**不能**引入均匀分布（uniform distribution)，因为均匀分布的数据的引入并不会改变概率分布。
+GAN*生成*能力是*学习分布*，引入的latent variable的noise使习得的概率分布进行偏移。因此在训练GAN的时候，latent variable**不能**引入均匀分布（uniform distribution)，因为均匀分布的数据的引入并不会改变概率分布。
+
+## GAN能做数据增广吗？
+GAN能够从一个模型引入一个随机数之后「生成」无限的output，用GAN来做数据增广似乎很有吸引力并且是一个极清晰的一个insight。然而，纵观整个GAN的训练过程，Generator习得分布再引入一个Distribution(Gaussian或其他)的噪声以「骗过」Discriminator，并且无论是KL Divergence或是Wasserstein Divergence，本质还是信息衡量的手段（在本章中其余部分介绍），能「骗过」Discriminator的Generator一定是能在引入一个Distribution的噪声的情况下最好的结合已有信息。
+
+训练好的GAN应该能够很好的使用已有的数据的信息（特征或分布），现在问题来了，这些信息本来就包含在数据里面，有必要把信息丢到Generator学习使得的结果加上噪声作为训练模型的输入吗？
 
 ## VAE与GAN有什么不同？
 1. VAE可以直接用在离散型数据。
 2. VAE整个训练流程只靠一个假设的loss函数和KL Divergence逼近真实分布。GAN没有假设单个loss函数, 而是让判别器D和生成器G互相博弈，以期得到Nash Equilibrium。
 
-
 ## 有哪些优秀的GAN？
 
 ### DCGAN
@@ -22,7 +26,7 @@ WGAN及其延伸是被讨论的最多的部分，原文连发两文，第一篇(
 
 **KL/JS Divergence为什么不好用？Wasserstein Divergence牛逼在哪里？**
 
-**KL Divergence**是两个概率分布P和Q差别的**非对称性**的度量。KL Divergence是用来度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数。 而**JS Divergence**是KL Divergence的升级版，解决的是**对称性**的问题。即：JS Divergence是对称的。
+**KL Divergence**是两个概率分布P和Q差别的**非对称性**的度量。KL Divergence是用来度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数（即分布的平移量）。 而**JS Divergence**是KL Divergence的升级版，解决的是**对称性**的问题。即：JS Divergence是对称的。并且由于KL Divergence不具有很好的对称性，将KL Divergence考虑成距离可能是站不住脚的，并且可以由KL Divergence的公式中看出来，平移量$\to 0$的时候，KL Divergence直接炸了。
 
 KL Divergence:
 $$D_{KL}(P||Q)=-\sum_{x\in X}P(x) log\frac{1}{P(x)}+\sum_{x\in X}p(x)log\frac{1}{Q(x)}=\sum_{x\in X}p(x)log\frac{P(x)}{Q(x)}$$
@@ -40,9 +44,14 @@ $$JS(P_1||P_2)=\frac{1}{2}KL(P_1||\frac{P_1+P_2}{2})$$
 3. 对更新后的权重强制截断到一定范围内，比如[-0.01，0.01]，以满足lipschitz连续性条件。
 4. 论文中也推荐使用SGD，RMSprop等优化器，不要基于使用动量的优化算法，比如adam。
 
-然而，就实际而言，最优的选择其实应该是
+然而，由于D和G其实是各自有一个loss的，G和D是可以**用不同的优化器**的。个人认为Best Practice是G用SGD或RMSprop，而D用Adam。
 
-**如何理解Wass距离？**
+很期待未来有专门针对寻找均衡态的优化方法。
+
+**WGAN-GP的改进有哪些？**
+
+**如何理解Wasserstein距离？**
+Wasserstein距离与optimal transport有一些关系，并且从数学上想很好的理解需要一定的测度论的知识。
 
 ### condition GAN
 
@@ -58,6 +67,9 @@ $$L^{infoGAN}_{G}=L^{GAN}_G-\lambda L_1(c,c')$$
 
 ### StarGAN
 目前Image-to-Image Translation做的最好的GAN。
+
+## Self-Attention GAN
+
 ## GAN训练有什么难点？
 由于GAN的收敛要求**两个网络（D&G）同时达到一个均衡**，
 
@@ -71,6 +83,8 @@ GAN是一种半监督学习模型，对训练集不需要太多有标签的数
 
 Instance Norm比Batch Norm的效果要更好。
 
+使用逆卷积来生成图片会比用全连接层效果好，全连接层会有较多的噪点，逆卷积层效果清晰。
+
 ## GAN如何解决NLP问题
 
 GAN只适用于连续型数据的生成，对于离散型数据效果不佳，因此假如NLP方法直接应用的是character-wise的方案，Gradient based的GAN是无法将梯度Back propagation（BP）给生成网络的，因此从训练结果上看，GAN中G的表现长期被D压着打。
diff --git a/ch9_图像分割/modify_log.txt b/ch9_图像分割/modify_log.txt
index 242f677..2a39323 100644
--- a/ch9_图像分割/modify_log.txt
+++ b/ch9_图像分割/modify_log.txt
@@ -28,6 +28,12 @@ modify_log---->用来记录修改日志
 4. 修改或者删除部分9.9.3公式、部分说法（可讨论），增加论文链接
 5. 修改或者删除部分9.9.4公式、部分说法（可讨论），增加论文链接
 
+<----qjhuang-2018-11-7---->
+1. 修改9.5答案部分说法（可讨论）
+
+<----qjhuang-2018-11-9---->
+1. 修改部分答案公式，链接
+
 其他---->待增加
 2. 修改readme内容
 3. 修改modify内容
diff --git a/ch9_图像分割/第九章_图像分割.md b/ch9_图像分割/第九章_图像分割.md
index 4a406a7..0fd722d 100644
--- a/ch9_图像分割/第九章_图像分割.md
+++ b/ch9_图像分割/第九章_图像分割.md
@@ -219,9 +219,7 @@ learning rate：0.001。
 &emsp;&emsp;
 (2)	左边的网络是收缩路径：使用卷积和maxpooling。  
 &emsp;&emsp;
-(3)	右边的网络是扩张路径:使用上采样产生的特征图与左侧收缩路径对应层产生的特征图进行concatenate操作。（pooling层会丢失图像信息和降低图像分辨率且是不可逆的操作，对图像分割任务有一些影响，对图像分类任务的影响不大，为什么要做上采样？ 
-&emsp;&emsp;
-因为上采样可以补足一些图片的信息，但是信息补充的肯定不完全，所以还需要与左边的分辨率比较高的图片相连接起来（直接复制过来再裁剪到与上采样图片一样大小），这就相当于在高分辨率和更抽象特征当中做一个折衷，因为随着卷积次数增多，提取的特征也更加有效，更加抽象，上采样的图片是经历多次卷积后的图片，肯定是比较高效和抽象的图片，然后把它与左边不怎么抽象但更高分辨率的特征图片进行连接）。  
+(3)	右边的网络是扩张路径:使用上采样产生的特征图与左侧收缩路径对应层产生的特征图进行concatenate操作。（pooling层会丢失图像信息和降低图像分辨率且是不可逆的操作，对图像分割任务有一些影响，对图像分类任务的影响不大，为什么要做上采样？因为上采样可以补足一些图片的信息，但是信息补充的肯定不完全，所以还需要与左边的分辨率比较高的图片相连接起来（直接复制过来再裁剪到与上采样图片一样大小），这就相当于在高分辨率和更抽象特征当中做一个折衷，因为随着卷积次数增多，提取的特征也更加有效，更加抽象，上采样的图片是经历多次卷积后的图片，肯定是比较高效和抽象的图片，然后把它与左边不怎么抽象但更高分辨率的特征图片进行连接）。  
 &emsp;&emsp;
 (4)	最后再经过两次反卷积操作，生成特征图，再用两个1X1的卷积做分类得到最后的两张heatmap,例如第一张表示的是第一类的得分，第二张表示第二类的得分heatmap,然后作为softmax函数的输入，算出概率比较大的softmax类，选择它作为输入给交叉熵进行反向传播训练。
 
@@ -368,11 +366,14 @@ RefineNet block的作用就是把不同resolution level的feature map进行融
 <center><img src="./img/ch9/figure_9.4_2.png"></center> 
 
 &emsp;&emsp;
-Residual convolution unit就是普通的去除了BN的residual unit；    
+Residual convolution unit就是普通的去除了BN的residual unit；  
+  
 &emsp;&emsp;
-Multi-resolution fusion是先对多输入的feature map都用一个卷积层进行adaptation(都化到最小的feature map的shape)，再上采样再做element-wise的相加。注意如果是像RefineNet-4那样的单输入block这一部分就直接pass了；   
+Multi-resolution fusion是先对多输入的feature map都用一个卷积层进行adaptation(都化到最小的feature map的shape)，再上采样再做element-wise的相加。注意如果是像RefineNet-4那样的单输入block这一部分就直接pass了；
+
 &emsp;&emsp;
-Chained residual pooling 没太看懂怎么残差相加的(其中的ReLU非常重要)，之后再修改；  
+Chained residual pooling中的ReLU对接下来池化的有效性很重要，还可以使模型对学习率的变化没这么敏感。这个链式结构能从很大范围区域上获取背景context。另外，这个结构中大量使用了identity mapping这样的连接，无论长距离或者短距离的，这样的结构允许梯度从一个block直接向其他任一block传播。
+
 &emsp;&emsp;
 Output convolutions就是输出前再加一个RCU。
 
@@ -577,7 +578,8 @@ Why K个mask？通过对每个 Class 对应一个Mask可以有效避免类间竞
 ## **9.9 CNN在基于弱监督学习的图像分割中的应用**
 
 &emsp;&emsp;
-https://zhuanlan.zhihu.com/p/23811946  
+答案来源：[CNN在基于弱监督学习的图像分割中的应用](https://zhuanlan.zhihu.com/p/23811946)  
+
 &emsp;&emsp;
 最近基于深度学习的图像分割技术一般依赖于卷积神经网络CNN的训练，训练过程中需要非常大量的标记图像，即一般要求训练图像中都要有精确的分割结果。  
 &emsp;&emsp;
@@ -585,11 +587,11 @@ https://zhuanlan.zhihu.com/p/23811946
 &emsp;&emsp;
 如果学习算法能通过对一些初略标记过的数据集的学习就能完成好的分割结果，那么对训练数据的标记过程就很简单，这可以大大降低花在训练数据标记上的时间。这些初略标记可以是：  
 &emsp;&emsp;
-1， 只给出一张图像里面包含哪些物体，  
+1、只给出一张图像里面包含哪些物体，  
 &emsp;&emsp;
-2， 给出某个物体的边界框，   
+2、给出某个物体的边界框，   
 &emsp;&emsp;
-3， 对图像中的物体区域做部分像素的标记，例如画一些线条、涂鸦等（scribbles)。
+3、对图像中的物体区域做部分像素的标记，例如画一些线条、涂鸦等（scribbles)。
 
 **9.9.1 Scribble标记**
 
@@ -601,22 +603,23 @@ https://zhuanlan.zhihu.com/p/23811946
 &emsp;&emsp;
 ScribbleSup分为两步，第一步将像素的类别信息从scribbles传播到其他未标记的像素，自动完成所有的训练图像的标记工作； 第二步使用这些标记图像训练CNN。在第一步中，该方法先生成super-pxels, 然后基于graph cut的方法对所有的super-pixel进行标记。
 <center><img src="./img/ch9/figure_9.9_2.png"></center>  
+
 &emsp;&emsp;
 Graph Cut的能量函数为：
 
 $$
-\sum_{i}\psi _i\left ( y_i | X,S \right ) +\sum_{i,j}\psi _{ij}\left ( y_i,y_j,X \right )
+\sum_{i}\psi _i\left(y_i|X,S\right)+\sum_{i,j}\psi_{ij}\left(y_i,y_j,X\right)
 $$
 
 &emsp;&emsp;
 在这个graph中，每个super-pixel是graph中的一个节点，相接壤的super-pixel之间有一条连接的边。这个能量函数中的一元项包括两种情况，一个是来自于scribble的，一个是来自CNN对该super-pixel预测的概率。整个最优化过程实际上是求graph cut能量函数和CNN参数联合最优值的过程：
 
 $$
-\sum_{i}\psi _i^{scr}\left ( y_i | X,S \right ) +\sum _i-logP\left(y_i | X,\theta \right)+\sum_{i,j}\psi _{ij}\left ( y_i,y_j| X \right )
+\sum_{i}\psi _i^{scr}\left(y_i|X,S\right)+\sum _i-logP\left(y_i| X,\theta\right)+\sum_{i,j}\psi _{ij}\left(y_i,y_j|X\right)
 $$
 
 &emsp;&emsp;
-上式的最优化是通过交替求$Y$和$\theta$的最优值来实现的。文章中发现通过三次迭代就能得到比较好的结果。
+上式的最优化是通过交替求 $Y$ 和 $\theta$ 的最优值来实现的。文章中发现通过三次迭代就能得到比较好的结果。
 <center><img src="./img/ch9/figure_9.9_3.png"></center>
 
 **9.9.2 图像级别标记**
@@ -629,11 +632,11 @@ UC Berkeley的Deepak Pathak使用了一个具有图像级别标记的训练数
 &emsp;&emsp;
 
 该方法把训练过程看作是有线性限制条件的最优化过程：
-$$
-\underset{\theta ,P}{minimize} \qquad D(P(X)||Q(X|\theta ))\\
 
-subject\ to \qquad  A\overrightarrow{P}\geqslant \overrightarrow{b},\sum_{X}^{ }P(X)=1
 $$
+\underset{\theta ,P}{minimize}\qquad D(P(X)||Q(X|\theta ))\\
+subject\to\qquad A\overrightarrow{P} \geqslant \overrightarrow{b},\sum_{X}^{ }P(X)=1
+$$ 
 &emsp;&emsp;
 
 其中的线性限制条件来自于训练数据上的标记，例如一幅图像中前景类别像素个数期望值的上界或者下界（物体大小）、某个类别的像素个数在某图像中为0，或者至少为1等。该目标函数可以转化为为一个loss function，然后通过SGD进行训练。
@@ -666,7 +669,7 @@ $$
 论文地址：[Learning to Segment Under Various Forms of Weak Supervision (CVPR 2015)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Xu_Learning_to_Segment_2015_CVPR_paper.pdf) 
 
 &emsp;&emsp;
-Wisconsin-Madison大学的Jia Xu提出了一个统一的框架来处理各种不同类型的弱标记：图像级别的标记、bounding box和部分像素标记如scribbles。该方法把所有的训练图像分成共计 个super-pixel，对每个super-pixel提取一个 维特征向量。因为不知道每个super-pixel所属的类别，相当于无监督学习，因此该方法对所有的super-pixel做聚类，使用的是最大间隔聚类方法(max-margin clustering, MMC),该过程的最优化目标函数是：
+Wisconsin-Madison大学的Jia Xu提出了一个统一的框架来处理各种不同类型的弱标记：图像级别的标记、bounding box和部分像素标记如scribbles。该方法把所有的训练图像分成共计$n$个super-pixel，对每个super-pixel提取一个$d$维特征向量。因为不知道每个super-pixel所属的类别，相当于无监督学习，因此该方法对所有的super-pixel做聚类，使用的是最大间隔聚类方法(max-margin clustering, MMC),该过程的最优化目标函数是：
 
 $$
 \underset{W,H}{min} \qquad  \frac{1}{2}tr\left ( W^TW \right ) + \lambda\sum_{p=1}^{n}\sum_{c=1}^{C}\xi \left ( w_c;x_p;h_p^c \right)