2016년 1월 17일 일요일

2015년 4월 13일 월요일

all-about-critics

A Data Visualization of Korean movie critics. link

Why?

Tons of new movies are released every year and the accumulated number of film reviews is increasing. When we are going to choose a movie to watch, we can judge them by diverse preferences such as genre, director or actors.

There mgiht be no doubt that review and star ratings are two of the most popular qualites and this can be proved by looking at information page of NAVER movie that star ratings from real audience, critic and user are located right bellow the title of a movie.

In addition, one of the well-knwon movie recommendation service, Watcha, uses star rating for its core recommendation system.

Anyone can post reviews or make star ratings and we usually classify them based on who made it, an ordinary user, or a critic. By the definition on Wikipedia, a critic is a person who is professional at an area and his or her publish an opinion and assessments of various forms of creative work such as movie. Critical judgments, whether derived from critical thinking or not, may be positive, negative, or balanced, weighing a combination of factors.

A critic from a user from here
A critic from a critic from here

Recently, however, lots of questions are made by lots of movie fans that "Are reviews of critics really considerable or not?"

A sharp criticism on critics from here
A disappointed comment on critics from here

This is where I started this project. As a movie fan, I statistically compared the patterns of star ratings from ordinary users and critics. To generalized the analysis, I comapred critics of Korean withMetacritic and Rotten Tomatoes. The result can be conceived as a proof whether skepticism about the quality of critics is right or not.

Now, enjoy yourself :)

link

Screenshot

Acknowledgement

There is no negative or positive opinion on any specific critics. Data is just a refinement of numbers, nothing else.

Copyright

2015 Kim Tae Hoon.

The MIT License (MIT)

[Flask] VoxOffice

Github : https://github.com/carpedm20/voxoffice

A Data Visualization of Box office history. link

Screenshot

VoxMusic

A Data Visualization of Music chart history. link

Copyright

2015 Kim Tae Hoon

The MIT License (MIT)

[Python] 컴공아 떠나자

Github : https://github.com/carpedm20/comgong-abroad

컴공아 떠나자 (a.k.a. Comgong Abroad) is a robot that uploads newest internship and recruit announcements of overseas corporation to Facebook automatically.

Facebook page : link

Copyright

2015 Kim Tae Hoon

The MIT License (MIT)

Screenshots

* 2015.04.17 *

* 2015.04.13 *

2014년 8월 18일 월요일

[Python] emoji

Emoji

Emoji is a simple Python module.

This project was inspired by emoji of kyokomi.

Installation

To install ndrive, simply:

$ pip install emoji

Example

from emoji import emojize

print emojize("Python is :thumbsup:")
print emojize("Do you want some :beer: ?")

Demo

Link

Emoji Cheat Sheet

Author

Taehoon Kim / @carpedm20

2014년 8월 3일 일요일

[Python] korail2

Korail2

Korail (www.letskorail.com) wrapper for Python.

This project was inspired from korail of devxoul.

korail is not working anymore becuase of a huge change in Korail API.

Installing

To install korail2, simply:

$ pip install korail2

Or, you can use:

$ easy_install korail2

Or, you can also install manually:

$ git clone git://github.com/carpedm20/korail2.git
$ cd korail2
$ python setup.py install

Using

1. Login

First, you need to create a Korail object.

>>> from korail2 import Korail
>>> korail = Korail("12345678", YOUR_PASSWORD) # with membership number
>>> korail = Korail("carpedm20@gmail.com", YOUR_PASSWORD) # with email
>>> korail = Korail("010-9964-xxxx", YOUR_PASSWORD) # with phone number

2. Search train

You can search train schedules search_train method. search_train method takes these arguments:

dep : A departure station in Korean ex) '서울'
arr : A arrival station in Korean ex) '부산'
date : (optional) A departure date in yyyyMMdd format
time : (optional) A departure time in hhmmss format
train_type: (optional) A type of train
- 00: KTX
- 01: 새마을호
- 02: 무궁화호
- 03: 통근열차
- 04: 누리로
- 05: 전체 (기본값)
- 06: 공학직통
- 07: KTX-산천
- 08: ITX-새마을
- 09: ITX-청춘

Below is a sample code of search_train:

>>> dep = '서울'
>>> arr = '동대구'
>>> date = '20140815'
>>> time = '144000'
>>> trains = korail.search_train(dep, arr, date, time)
[[KTX] 8월 3일, 서울~부산(11:00~13:42) [특실:1][일반실:1] 예약가능,
 [ITX-새마을] 8월 3일, 서울~부산(11:04~16:00) [일반실:1] 예약가능,
 [무궁화호] 8월 3일, 서울~부산(11:08~16:54) [일반실:0] 입석 역발매중,
 [ITX-새마을] 8월 3일, 서울~부산(11:50~16:50) [일반실:0] 입석 역발매중,
 [KTX] 8월 3일, 서울~부산(12:00~14:43) [특실:1][일반실:1] 예약가능,
 [KTX] 8월 3일, 서울~부산(12:30~15:13) [특실:1][일반실:1] 예약가능,
 [KTX] 8월 3일, 서울~부산(12:40~15:45) [특실:1][일반실:1] 예약가능,
 [KTX] 8월 3일, 서울~부산(12:55~15:26) [특실:1][일반실:1] 예약가능,
 [KTX] 8월 3일, 서울~부산(13:00~15:37) [특실:1][일반실:1] 예약가능,
 [KTX] 8월 3일, 서울~부산(13:10~15:58) [특실:1][일반실:1] 예약가능]

3. Make a reservation

You can get your tickes with tickets method.

>>> trains = korail.search_train(dep, arr, date, time)
>>> seat = korail.reserve(trains[0])
정상처리되었습니다
동일시간대 예약발매내역이 있습니다.
>>> seat
[KTX] 8월 3일, 서울~부산(11:00~:) 16호 6A

4. Get tickets

You can get your tickes with tickets method.

>>> tickets = k.tickets()
정상발매처리,정상발권처리
>>> tickets
[[KTX] 8월 10일, 동대구~울산(09:26~09:54) => 5호 4A, 13900원]

How do I get the Korail API

Extract Korail apk from mobile phone

Decompile apk using dex2jar

Read a jar code using jdgui

Edit a smaili code

Recompile a new Korail apk using apktool

Key signing with `motizen-sign`

Upload and run a new Korail apk

Capture packets and analyze the API

Todo

Distinguish adult and child

Make an option to select special seat or general seat when reserving

Make an option to reserve multiple seats at a time

Implement payment API

License

Source codes are distributed under BSD license.

Author

Taehoon Kim / @carpedm20

2014년 8월 2일 토요일

[Django] UNIST Auction

UNIST Auction

Auction for UNIST

Copyright

2014 Kim Tae Hoon.

The MIT License (MIT)

[Python] LINE

LINE

May the LINE be with you...

The documentation is available at here

Screenshot

License

Source codes are distributed under BSD license.

Author

Taehoon Kim / @carpedm20

2014년 6월 11일 수요일

유니스트 버스 언제와?

Github : https://github.com/carpedm20/chrome-unist-bus

2014년 6월 5일 목요일

[Django] MovieTag

Find a movie to watch with any tags you want!

Tags are automatically generated with morpheme analysis of big data.

Percentage of positive and negative reviews will be given through deep learning.

Documentation

The documentation is available at ???

Development History

Plan to make a web service which can search any movie with tags
- saw a new feature Game tag from steam
- saw a restaurant recommendation service "Dining code" using big data (reviews from blogs)
- want to find a movie not with a category like Romance but with a tag like first love,farewell etc.
Movie review parsing
- save data as json
Morpheme analysis
- first, used lucene-korean-analyzer
- have a weakness that cannot distinguish predicate and uninflected word and hard to get word frequencies from reviews
- next, used mecab-ko and mecab-ko-dic
- can get details from review like predicate and uninflected word information D4. Build a DB
- to connect with django, write a python code that import json data to SQLite
- but too slow file-io and cannot write a multi-thread code because of file lock (estimated time to import all data was 6 days)
- change DB to MySQL
- faster file-io and possible to write a multi-thread code (1~2 days)
- but sorting a movie with a specific tag was too slow
- plan to use Apache Cassandra, but found that it has slower read than write from google.
- data was json, so used MongoDB
- data import was finished only in a few seconds with mongoimport (Assert failure on mongorestore (b.empty()) error occured because of huge json file. so split the data into small files)
- DB querying speed was fater than MySQL (hooray~)
- Conclusion : Text indexing of MongoDB make faster speed than raw query of MySQL +Django
Build a web
- used Django webframework
- Back-end : used Django, South, endless-pagination etc.
- Front-end : used jQuery, Bootstrap, Bootstrap-twipsy, D3, Flat-UI, jQuery-Masonry,imagesloaded etc.
- complete tag search feature.
- developing infinite scroll...
Positive & Negative review
- with review data and using Logistic regression and Deep learning, plan to distinguish reviews into positive and negative review.
- first, make an adjective and noun list by using morpheme analysis.
- star point of review and movie will be used as a label in machine learning

Developement Histroy (Korean)

영화를 태그로 검색하는 서비스를 만들기로 계획
- steam의 게임 태그 라는 새로운 기능을 보게됨
- 빅데이터(블로그 글)를 이용해 음식점을 추천해 주는 다이닝 코드 를 보게됨
- 로멘스처럼 거대한 카테고라기 아닌 첫사랑, 이별 과 같은 keyword로 영화를 찾고 싶음
영화 리뷰 파싱
- json 파일로 저장
- json 에서 tag를 {"text": "첫사랑", "freq": 1} 로 저장했으나, 쿼리 낭비를 막기 위해 {"첫사랑": 1} 로 구조 변환
형태소 분석
- 처음에는 lucene-korean-analyzer를 사용
- 용언, 체언을 구분 못하고, 초기 버전에는 단어의 frequency를 알 방법이 없는 단점이 있음
- 다음으로 사용한 opensource는 mecab-ko 와 mecab-ko-dic
- 용언, 체언을 세세하게 분류한 결과가 나오는 등의 장점
DB 구축
- 처음에는 django 프로젝트와 연동을 위해 python 코드로 SQLite 에 집어 넣음
- DB에 import 하는 속도(file-io)가 너무 느림 & file-io에 lock이 걸려 멀티쓰레드를만들 수 없음 (6일 정도 소요 될 거라 예상)
- MySQL 로 DB 변경
- 넣는 속도가 sqlite 보다 월등히 빠르며, 멀티쓰레드로 돌려도 lock 처리를 MySQL이 알아서 해주는 장점 (1~2일 소요)
- 하지만 특정 tag에 대한 영화들을 tag의 frequency로 정렬하는 속도가 느림.
- Apache Cassandra 를 사용하려 했으나 짧은 구글링으로 write보다 read가 느리다는 글을 보게됨. read가 월등히 많을것이기 때문에 탈락
- 파싱 결과가 json이라는 것에 착안해 MongoDB를 사용
- mongoimport를 이용해 몇 초만에 db에 들어감 (json파일이 너무 커서 Assert failure on mongorestore (b.empty()) 오류 발생. 그래서 작게 잘라 넣었음)
- DB querying 속도가 월등히 빨라짐 (만세!)
- 결론 : MongoDB의 Text indexing 기능 때문에 raw query가 MySQL + Django ORM 보다 훨씬 빠른것으로 보인다
Web 구축
- Django webframework 사용 (이번 기회에 MEAN stack을 공부하려고 했으나... 빠른 개발을 위해 포기)
- Back-end : Django, South, endless-pagination 등 사용
- Front-end : jQuery, Bootstrap, Bootstrap-twipsy, D3, Flat-UI, jQuery-Masonry,imagesloaded 등 사용
- 태그 검색 기능 완성
- infinite scroll 기능 개발 중...
긍정 부정 리뷰
- 파싱한 리뷰 데이터를 이용해 리뷰의 긍정, 부정을 먼저 단일 형용사, 명사와 자주 같이 등장하는 형용사, 명사 pair들을 이용해 Logistic regression 을 이용해 본 후에 Deep learning을 이용해 분석할 계획
- 먼저 리뷰를 형태소 분석을 통해서 명사, 형용사 리스트를 만듦
- 리뷰에 어떤 형용사와 명사가 사용되었는지를 바탕으로 learning 시작
- learning시 label은 리뷰의 별점 및 영화의 평균 별점이 사용될 예정

Screenshot

* 2014.06.04 *

* 2014.06.07 *

Github : https://github.com/carpedm20/movietag

2016년 1월 17일 일요일

2015년 4월 13일 월요일

Github : https://github.com/carpedm20/all-about-critics

A Data Visualization of Korean movie critics. link

Why?

Screenshot

Acknowledgement

Copyright

Github : https://github.com/carpedm20/voxoffice

A Data Visualization of Box office history. link

Screenshot

VoxMusic

Copyright

Github : https://github.com/carpedm20/comgong-abroad

컴공아 떠나자 (a.k.a. Comgong Abroad) is a robot that uploads newest internship and recruit announcements of overseas corporation to Facebook automatically.

Copyright

Screenshots

2014년 8월 18일 월요일

Emoji

Installation

Example

Demo

Link

Author

2014년 8월 3일 일요일

Korail2

Installing

Using

1. Login

2. Search train

3. Make a reservation

4. Get tickets

How do I get the Korail API

Extract Korail apk from mobile phone Decompile apk using dex2jar Read a jar code using jdgui Edit a smaili code Recompile a new Korail apk using apktool Key signing with motizen-sign Upload and run a new Korail apk Capture packets and analyze the API

Todo

Distinguish adult and child Make an option to select special seat or general seat when reserving Make an option to reserve multiple seats at a time Implement payment API

License

Source codes are distributed under BSD license.

Author

Taehoon Kim / @carpedm20

2014년 8월 2일 토요일

UNIST Auction

Copyright

LINE

License

Author

2014년 6월 11일 수요일

2014년 6월 5일 목요일

Documentation

Development History

Developement Histroy (Korean)

Screenshot

Extract Korail apk from mobile phone

Decompile apk using dex2jar

Read a jar code using jdgui

Edit a smaili code

Recompile a new Korail apk using apktool

Key signing with `motizen-sign`

Upload and run a new Korail apk

Capture packets and analyze the API

Distinguish adult and child

Make an option to select special seat or general seat when reserving

Make an option to reserve multiple seats at a time

Implement payment API