Closed Bug 594882 Opened 14 years ago Closed 14 years ago

URL classifier result ignored (malware site not blocked) because of bug 513008

Categories

(Core :: Networking: Cache, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+
status1.9.2 --- unaffected
status1.9.1 --- unaffected

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

(Keywords: regression, Whiteboard: [sg:low])

Attachments

(2 files, 3 obsolete files)

I was trying to figure out what's happening in bug 589296.  One hour of brain debugging and I have a good theory what's going on there:

- nsHttpChannel is AsyncOpen'ed
- it calls Connect(true)
- it opens the cache entry, and always expects it asynchronously
- Connect returns NS_OK
- AsyncOpen starts the url classifier
- it immediately suspends the channel (nsHttpChannel::mSuspendCount = 1)
- before we get resumed by the classifier (before it finishes its job) we get OnCacheEntryAvailable
- it calls Connect(false)
- it creates the http transaction and its pump
- the transaction pump runs (we do AsyncRead, nsInputStreamPump::mSuspendCount = 0)
- it does the request and loads the response
- the response is a redirect, so we call the async redirect observers
- we call Suspend on the transaction: nsInputStreamPump::mSuspendCount = 1
- in the meantime classifier finishes its job and resumes us: nsHttpChannel::mSuspendCount = 0 and nsInputStreamPump::mSuspendCount = 0
- the transaction pump gets re-spin before we get the redirect callback
- it continues and finishes the channel load (we get OnStopRequest), we do a cleanup and cancel the channel
- then we get the redirect result, note that the error reported to the callback is NS_ERROR_INVALID_POINTER, quit strange
- we crash because the transaction has already been released

So, we have to postpone call to the cache entry callback, don't allow it sooner before we get resumed again.

Reporting as a new security bug, because Firefox 4 Beta 5 is not correctly classifying URLs before we start loading them.
If this gets blocking+, I can take it.
blocking2.0: --- → ?
And to explain why this bug was not present before patch for bug 513008 landed: an http channel was waiting for a cache entry asynchronously only when it was held by another channel.  And that cache entry could not be released sooner then the result from the classifier arrived, the time line was:

- nsHttpChannel 1 
- open and keep a cache entry (sync open)
- call to the classifier
- get suspended
                     - nsHttpChannel 2 
                     - open but wait for a cache entry (async)
                     - call to the classifier
                     - get suspended
- classifier is done
- get resumed
                     - get resumed (actually nothing to resume...)
- do a request, load a response
- release the cache entry
                     - get the cache entry
                     - load from cache
Your brain debugging is more efficient than mine - I think you're right on spot here. :)
Blocking.
blocking2.0: ? → betaN+
I am working on the patch + re-land of backout from bug 589296.
Attached patch v1 (obsolete) — Splinter Review
A general approach fix.  Whenever a channel wants to start the transaction pump or the cache pump while it is currently suspended, that start is delayed.  Pumps are then started when the channel gets resumed.

Now pushing to try server, done only some local tests and random browsing.  All seems to work correctly.
Attachment #474396 - Flags: review?(bzbarsky)
Hmm, do we not have tests to check if the URL classifier is working? Or did this just break in such a way that it managed to pass existing tests?

Worth adding a high-level test to ensure that visiting a test URL in the DB correctly results in an attack/phish page being shown?
(In reply to comment #6)
> A general approach fix.  Whenever a channel wants to start the transaction pump
> or the cache pump while it is currently suspended, that start is delayed. 
> Pumps are then started when the channel gets resumed.

Just an idea: Would it make sense to rather let the pump(s) "inherit" the value of mChannel::mSuspendCount when the channel creates them? I.e. make sure that a suspended channel creates suspended pumps in order to keep their suspendcount-semaphores in sync.
Comment on attachment 474396 [details] [diff] [review]
v1

Dropping review flag.  This patch regularly failed on try server.  Going to investigate and react to all comments when I know more.
Attachment #474396 - Flags: review?(bzbarsky)
(In reply to comment #7)
> Hmm, do we not have tests to check if the URL classifier is working? Or did
> this just break in such a way that it managed to pass existing tests?
> 
> Worth adding a high-level test to ensure that visiting a test URL in the DB
> correctly results in an attack/phish page being shown?

Bug 589585?  Excellent oversight of a security problem?
Blocks: 589585
(In reply to comment #10)
> Excellent oversight of a security problem?
Ah, I it's a blocking bug, so no oversight, just missing link to bug 513008, sorry.
Comment on attachment 474396 [details] [diff] [review]
v1

So, test failures were caused by non-closed cache entries when we failed resume.  That helped to fix the test timeouts.

However, before the patch for bug 513008 landed, we allowed to do the request as a prefetch but didn't allow load of the content to upper levels.  I wrongly assumed we even don't want to do the request before we check the URI ; that assumption is not correct, right?

So, we really need to sync the suspend counters on pumps to allow the prefetch but don't allow the content be processed, correct?
Attached patch v1.1 (obsolete) — Splinter Review
Updated v1 patch.  Just for a reference...
Attachment #474396 - Attachment is obsolete: true
Attached patch v2 (obsolete) — Splinter Review
What Bjarne suggested, much cleaner and safer patch, suspend the pump as many times as the channel has been suspended.

Pushing to try again.
Attachment #474516 - Flags: review?(bzbarsky)
Attached patch v2.1Splinter Review
Fixed missing return NS_OK in each method (only a warning on windows).

Tested with classifier thread slowed rapidly down (::Sleep()) that we get the intermittent failure from bug 589585 w/o this patch and not w/ the patch.

Passed try server.
Attachment #474516 - Attachment is obsolete: true
Attachment #474574 - Flags: review?(bzbarsky)
Attachment #474516 - Flags: review?(bzbarsky)
Group: core-security
Severity: blocker → major
Component: Security → Networking: Cache
Keywords: regression
QA Contact: toolkit → networking.cache
Summary: [Firefox 4 beta 5] URL classifier result ignored, vulnerable to site attacks, cause of bug 589296 → URL classifier result ignored (malware site not blocked) because of bug 589296
Whiteboard: [sg:low]
Do we need this on the older branches? The patch applies, but I guess without bug 513008 it's not biting us. Can we count on that or have we just gotten lucky?
status1.9.1: --- → ?
status1.9.2: --- → ?
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
No.  This is a change touching an old and sensitive code, we should not take this on stable branches.  Another reason is that the patch from bug 513008 that caused this bug has landed only on mozilla-central and is not planned to land on older branches, at least AFAIK.
Thanks biesi.

bz, do you want to check the patch as well or is Cristian's r+ enough?
Blocks: 513008
Comment on attachment 474574 [details] [diff] [review]
v2.1

biesi' r= is just fine here.  He knows this stuff way better than I do.
Attachment #474574 - Flags: review?(bzbarsky)
This what has to be landed.
Attachment #474515 - Attachment is obsolete: true
Summary: URL classifier result ignored (malware site not blocked) because of bug 589296 → URL classifier result ignored (malware site not blocked) because of bug 513008
Comment on attachment 475562 [details] [diff] [review]
v2.1 + re-land of backout from bug 589296 [Check-in comment 21]

http://hg.mozilla.org/mozilla-central/rev/ec6c7ae80489
Attachment #475562 - Attachment description: v2.1 + re-land of backout from bug 589296 → v2.1 + re-land of backout from bug 589296 [Check-in comment 21]
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
blocking1.9.1: ? → ---
blocking1.9.2: ? → ---
Depends on: 600371
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: