Feature Name |
Description |
Creator Features |
|
Creator Name* |
Username of article creator |
Creator Days on Site |
How long has article creator had a WP account? |
Creator Num Edits |
How many edits has the article creator made sitewide? |
Creator Status |
Creator’s account status (open or blocked) |
Num other pages |
Number of other kept pages created by article creator |
Userpage |
Does article creator have a userpage? |
Page Features |
|
Title* |
Article Title |
Createdate* |
Date article was created |
File Size! |
Size of the entire file, including all revisions |
Has Talk Page! |
Does the article have an accompanying talk page |
Topic Features |
|
Num links to here |
Number of other Wikipedia articles linking to the article |
Num links in from Web |
Number of external web pages linking to article |
Pageviews** |
Number of visits page has had |
Num Hits |
Number of search engine results for the page title |
Article Features |
|
Num categories |
Number of WP categories the article belongs to |
Num images |
Number of images in the article |
Num references |
Number of references in the article |
Num sections |
Number of sections in the article |
Num out Wikilinks |
Number of links in the article to other WP pages |
Infobox* |
What kind of infobox does the article contain? |
Total Size in Bytes |
Length, in bytes, of the final version of the article |
Revision Features |
|
Num Revisions |
Number of revisions to the article |
Num registered edits |
Number of edits made by registered users |
Num anonymous edits |
Number of edits made by anonymous users |
Num unique Editors |
Number of unique users who edited the article |
Time to Delete |
Number of days between article creation and proposal for deletion |
More than half anon |
Boolean; are more than half of the edits made by anon users? |
Has main editor |
Boolean; has one user created > 50% of the content? |
Creator is main editor |
Boolean; is the article creator the main editor? |
Likelihood Autobio |
String similarity between article title and creator’s username |
Text* |
Bag of all words in the final version of the article |
Language Features |
|
Normalized noun count |
# of nouns in article, normalized by article length |
Normalized verb count |
# of verbs in article, normalized by article length |
Normalized adjective count |
# of adjectives in article, normalized by article length |
Normalized adverb count |
# of adverbs in article, normalized by article length |
FK reading level |
Flesch-Kincaid reading level of the article |
SMOG reading level |
SMOG reading level index |
Cl level |
Coleman-Liau reading level index |
Level avg |
Average of 5 reading level indexes |
FK reading ease |
Flesch-Kincaid reading ease measure of the article |
* Not used for classification because these features are extremely sparse.
** Only used in New dataset because of a MediaWiki software error that miscounted pageviews in December 2011.
! Not used in the Original dataset classification because feature selection found that they reduced accuracy.