Python Ebook Reader/Extractor - Only support epub format for now
pyebook extract content and meta informations on ebook files. It relies on the .epub standards by the idpdf
from pip::
pip install --upgrade pyebook
from easy_install::
easy_install -ZU pyebook
pyebook requires beautifulsoup4
Import::
from pyebook.pyebook import Book
Initialize the book object::
my_book = Book('my_ebook_file.epub')
Return the book metadata::
my_book.metadata
Load the book content::
my_book.load_content()
Return the book content::
my_book.content
All the returned objects are dictionaries or dictionaries list like::
# metadata
{
'date': <publication date>,
'identifier': <book identifier (generally the ISBN)>,
'creator': <book author>,
'language': <book language>,
'title': <book title>'
}
# content (list of dictionaries)
[
{
'part_name': <book part name>,
'source_url': <book part file url>,
'content_source': <book part html source code>,
'content_source_text': <book part content>
}
]