Speaker Deck
Features
Speaker Deck
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Presented at Mongo Chicago 2011.
John Nunemaker
October 18, 2011
Tweet
Share
More Decks by John Nunemaker
See All by John Nunemaker
Atom
jnunemaker
10
4.3k
MongoDB for Analytics
jnunemaker
11
930
Addicted to Stable
jnunemaker
32
2.6k
MongoDB for Analytics
jnunemaker
21
2.3k
Why You Should Never Use an ORM
jnunemaker
56
9.4k
Why NoSQL?
jnunemaker
10
940
Don't Repeat Yourself, Repeat Others
jnunemaker
7
3.4k
I Have No Talent
jnunemaker
14
960
Why MongoDB Is Awesome
jnunemaker
18
4.4k
Other Decks in Programming
See All in Programming
ktr0731/go-mcpでMCPサーバー作ってみた
takak2166
0
170
2度もゼロから書き直して、やっとブラウザでぬるぬる動くAIに辿り着いた話
tomoino
0
160
Go Modules: From Basics to Beyond / Go Modulesの基本とその先へ
kuro_kurorrr
0
120
AIネイティブなプロダクトをGolangで挑む取り組み
nmatsumoto4
0
120
技術懸念に立ち向かい 法改正を穏便に乗り切った話
pop_cashew
0
1.4k
Passkeys for Java Developers
ynojima
3
860
ドメインモデリングにおける抽象の役割、tagless-finalによるDSL構築、そして型安全な最適化
knih
11
1.9k
Bytecode Manipulation 으로 생산성 높이기
bigstark
2
350
カクヨムAndroidアプリのリブート
numeroanddev
0
430
Create a website using Spatial Web
akkeylab
0
290
型付きアクターモデルがもたらす分散シミュレーションの未来
piyo7
0
790
プロダクト開発でも使おう 関数のオーバーロード
yoiwamoto
0
150
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
780
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
4
180
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
It's Worth the Effort
3n
184
28k
Stop Working from a Prison Cell
hatefulcrawdad
269
20k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
20
1.3k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.6k
Site-Speed That Sticks
csswizardry
10
640
BBQ
matthewcrist
89
9.7k
Transcript
Ordered List John Nunemaker MongoChi 2011 October 18, 2011 MongoDB
for Analytics A loving conversation with @jnunemaker
Background As presented through interpretive dance
None
None
None
~1 month Of evenings and weekends
~4 dog years Since public launch
~6 tiny servers 2 web, 2 app, 2 db
~1-2 Million Page views per day
None
None
Implementation Imma show you how we do what we do
baby
Doing It Live No aggregate querying
get('/track.gif') do Hit.record(...) TrackGif end
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{site_id}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth The best laid plans of mice and men
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
Ordered List Thank you!
[email protected]
John Nunemaker MongoChi 2011 October
18, 2011 @jnunemaker